Monthly Archives: January 2012

Fill/Retrieve data from PDF Form Fields using VB.Net From an Excel File


EDIT

I have stopped updating this blog. This link can also be found in my Website.

http://www.siddharthrout.com/2012/01/28/fillretrieve-data-from-pdf-form-fields-using-vb-net-from-an-excel-file/

 

  • Tired of filling a PDF report every day? OR
  • Tired of collecting information from a pdf everyday?

Here are ways to ease your pain Smile

We will separate the article in 3 sections

  1. Filling PDF Form Fields
  2. Extracting the PDF Form Field names
  3. Retrieving data from PDF Form Fields

Filling PDF Form Fields


Imagine if you need to fill a Daily Report and mail it to your boss over the email every day? In the beginning, it might sound adventurous but then you start hating the monotonous routine.

Here is one quick way to fill the Form Fields of PDF using VB.Net from data stored in Excel.

To begin with let’s design our form. Place 2 textboxes, 4 command buttons and one OpenFileDialogon a form. Once done the form will look similar to what is show below.

image

Now let’s say you have a PDF which looks like below. I created this PDF just to demonstrate on how to fill the PDF.

image

Once the data is filled, your PDF will look like this

image

Next is preparing your Excel Database.

Open Excel and Type the Data as shown below. Once done, save it as C:\PDF_FORM_DATA.XLSX.

This is how your Excel file might look.

image

We are now all set to write the code in VB.net

Double click on the “Browse” Button which we will be use to select the PDF file and paste this code.

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        With OpenFileDialog1
            .DefaultExt = ".PDF"
            .DereferenceLinks = True
            .Filter = _
             "PDF files (*.PDF)|*.PDF"
            .Multiselect = False
            .Title = "Select a PDF file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox1.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

Next Double click on the “Browse” Button which we will be use to select the Excel file and paste this code.

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        With OpenFileDialog1
            .DefaultExt = ".XLSX"
            .DereferenceLinks = True
            .Filter = _
             "Excel files (*.XLSX)|*.XLSX"
            .Multiselect = False
            .Title = "Select an Excel file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox2.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

Now Double click on the “Cancel” Button and paste this code.

    Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
        Me.Close()
    End Sub

Finally Double click on the “Update” Button and paste this code.

    Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
        Dim pdfTemplate As String = TextBox1.Text
        '~~> Change the Output FileName here
        Dim PDFUpdatedFile As String = "C:\PDF_FORM_DATA (UPDATED).pdf"

        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim stamperPDF As New PdfStamper(readerPDF, _
        New FileStream(PDFUpdatedFile, FileMode.Create))

        Dim pdfFormFields As AcroFields = stamperPDF.AcroFields

        '~~> Define your Excel Objects
        Dim xlApp As New Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet

        '~~> Show/Hide Excel
        xlApp.Visible = True

        '~~> Opens an exisiting Workbook
        xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

        '~~> Set the relevant sheet that we want to work with
        xlWorkSheet = xlWorkBook.Sheets("Sheet1")

        '~~> Update pdf FormFields
        pdfFormFields.SetField(xlWorkSheet.Range("A2").Value.ToString, xlWorkSheet.Range("B2").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A3").Value.ToString, xlWorkSheet.Range("B3").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A4").Value.ToString, xlWorkSheet.Range("B4").Value.ToString)

        '~~> To remove editting options from the output Form, set it to FALSE
        '~~> To leave then editting options open in the output Form, set it to TRUE
        stamperPDF.FormFlattening = True

        '~~> close the pdf
        stamperPDF.Close()

        '~~> Close the Excel file without saving
        xlWorkBook.Close(False)
        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

Add the below code at the very top.

Imports iTextSharp
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.text.xml
Imports System.IO
Imports Excel = Microsoft.Office.Interop.Excel

and add one Sub at the bottom for clean up.

    '~~> Release the objects
    Private Sub releaseObject(ByVal obj As Object)
        Try
            System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
            obj = Nothing
        Catch ex As Exception
            obj = Nothing
        Finally
            GC.Collect()
        End Try
    End Sub

The last thing that we need to do is setting References for Excel and itextsharp.dll

itextsharp.dll is freely available on the web and it is free to use for Non Commercial Applications. Once you have downloaded the file, save it at a location of your choice. Once downloaded, click on the menu PROJECT | Add Reference. Navigate to the “Browse” tab and select the DLL and click on OK.

image

Next set a reference to the Excel Object Library. See this link for more information.

Your Final code will look like this. Now run the code and select the relevant files. Once done click on update. The new file with the updated Data will be saved in C:\ as PDF_FORM_DATA (UPDATED).PDF.

Info You can change the file name and path of the updated file in the code itself.

FINAL CODE:

Imports iTextSharp
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.text.xml
Imports System.IO
Imports Excel = Microsoft.Office.Interop.Excel

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        With OpenFileDialog1
            .DefaultExt = ".PDF"
            .DereferenceLinks = True
            .Filter = _
             "PDF files (*.PDF)|*.PDF"
            .Multiselect = False
            .Title = "Select a PDF file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox1.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        With OpenFileDialog1
            .DefaultExt = ".XLSX"
            .DereferenceLinks = True
            .Filter = _
             "Excel files (*.XLSX)|*.XLSX"
            .Multiselect = False
            .Title = "Select an Excel file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox2.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

    Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
        Me.Close()
    End Sub

    Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
        Dim pdfTemplate As String = TextBox1.Text
        '~~> Change the Output FileName here
        Dim PDFUpdatedFile As String = "C:\PDF_FORM_DATA (UPDATED).pdf"

        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim stamperPDF As New PdfStamper(readerPDF, _
        New FileStream(PDFUpdatedFile, FileMode.Create))

        Dim pdfFormFields As AcroFields = stamperPDF.AcroFields

        '~~> Define your Excel Objects
        Dim xlApp As New Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet

        '~~> Show/Hide Excel
        xlApp.Visible = True

        '~~> Opens an exisiting Workbook
        xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

        '~~> Set the relevant sheet that we want to work with
        xlWorkSheet = xlWorkBook.Sheets("Sheet1")

        '~~> Update pdf FormFields
        pdfFormFields.SetField(xlWorkSheet.Range("A2").Value.ToString, xlWorkSheet.Range("B2").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A3").Value.ToString, xlWorkSheet.Range("B3").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A4").Value.ToString, xlWorkSheet.Range("B4").Value.ToString)

        '~~> To remove editting options from the output Form, set it to FALSE
        '~~> To leave then editting options open in the output Form, set it to TRUE
        stamperPDF.FormFlattening = True

        '~~> close the pdf
        stamperPDF.Close()

        '~~> Close the Excel file without saving
        xlWorkBook.Close(False)
        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

    '~~> Release the objects
    Private Sub releaseObject(ByVal obj As Object)
        Try
            System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
            obj = Nothing
        Catch ex As Exception
            obj = Nothing
        Finally
            GC.Collect()
        End Try
    End Sub
End Class

Extracting PDF Form Field names


Now Let’s assume that you don’t know what the form fields are. In such a case you can extract the field names from the PDF using the below code.

Insert one more textbox in the above form and also add one more command button. Set the multiline property of the textbox to TRUE.

Your form now should look like this

image

Double click on the “Get Form Fields” Button and paste this code.

    Private Sub Button5_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button5.Click
        Dim pdfTemplate As String = TextBox1.Text
        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim PDFfld As Object

        For Each PDFfld In readerPDF.AcroFields.Fields
            If TextBox3.Text = "" Then
                TextBox3.Text = PDFfld.key.ToString()
            Else
                TextBox3.Text = TextBox3.Text & Environment.NewLine & PDFfld.key.ToString()
            End If
        Next

        TextBox3.SelectionStart = 0
    End Sub

Now when you select the pdf file and click on the “Get Form Fields” button, you will notice the textbox populates with the Form Field names.

image


Retrieving data from PDF Form Fields


Now let’s add one more button to the above form and call it “Get Form Data”. Your form should look like this.

image

Double click on the “Get Form Data” Button and paste this code. I will be using the same Excel file that we created above to output the data. I would output the data in Col C.

    Private Sub Button6_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button6.Click
        Dim pdfTemplate As String = TextBox1.Text
        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim pdfFormFields As AcroFields = readerPDF.AcroFields

        '~~> Define your Excel Objects
        Dim xlApp As New Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet
        Dim LastRow As Integer, i As Integer

        '~~> Show/Hide Excel
        xlApp.Visible = True

        '~~> Opens an exisiting Workbook
        xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

        '~~> Set the relevant sheet that we want to work with
        xlWorkSheet = xlWorkBook.Sheets("Sheet1")

        '~~> Get the last row in Col A which has Form Fields
        LastRow = xlWorkSheet.Range("A" & xlApp.Rows.Count).End(Excel.XlDirection.xlUp).Row

        '~~> Loop through Col A and use the Form Fields to extract text
        For i = 2 To LastRow
            xlWorkSheet.Range("C" & i).Value = pdfFormFields.GetField(xlWorkSheet.Range("A" & i).Value)
        Next

        '~~> Close the Excel file without saving
        xlWorkBook.Close(True)
        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

This is how the Excel file looks after you run the above code.

image


Visual Studio Achievements


EDIT

I have stopped updating this blog. This link can also be found in my Website.

http://www.siddharthrout.com/2012/01/23/visual-studio-achievements/

 

I recently discovered this Extension for Visual Studio. It is called the “Visual Studio Achievements”. Microsoft’s Channel 9 launched the beta on the 18th. I have already downloaded it and it bought a smile on my face every time an achievement was unlocked.

With the Visual Studio Achievements Extension, achievements are unlocked based on your activity. Your code is analysed on a background thread each time you compile.  In addition, the extension listens for certain events and actions that you may perform in Visual Studio, reporting progress on these events to the server.

When an achievement is unlocked, Visual Studio lets you know visually with a pop-up.

image

image

The only thing which I didn’t like is the way it is being promoted

Impress your friends!
Earn achievements while you code!
Code while you earn achievements! “

Whoa! What has coding to do with impressing your friends??? Winking smile

This extension is not for programmers who deliberately try to unlock the achievements to impress their friends. I would recommend the Facebook “Farm-Ville” crap for that. Disappointed smile

The thing that really caught my attention was this line and which I really liked.

“But rarely is a developer appreciated for all the nuances and subtleties of a piece of code–and all the heroics it took to write it. With Visual Studio Achievements Beta, your talents are recognized as you perform various coding feats, unlock achievements and earn badges.”

If I were to rank it on a scale of 0 – 10, I would definitely give it a 10 even though it is on a Beta version.

So go ahead and give it a try and I am sure you will love it!

VB.Net to Read and Set Excel’s Inbuilt Document Properties


EDIT

I have stopped updating this blog. This link can also be found in my Website.

http://www.siddharthrout.com/2012/01/18/vb-net-to-read-and-set-excels-inbuilt-document-properties/

 

Document properties allow us to get information about a document. They are either built into the document, or are custom, user defined properties. The default properties are called BuiltinDocumentProperties.

Here is a list of complete built-in properties are

  1. Title
  2. Subject
  3. Author
  4. Keywords
  5. Comments
  6. Template
  7. Last author
  8. Revision number
  9. Application name
  10. Last print date
  11. Creation date
  12. Last save time
  13. Total editing time
  14. Number of pages
  15. Number of words
  16. Number of characters
  17. Security
  18. Category
  19. Format
  20. Manager
  21. Company
  22. Number of bytes
  23. Number of lines
  24. Number of paragraphs
  25. Number of slides
  26. Number of notes
  27. Number of hidden Slides
  28. Number of multimedia clips
  29. Hyperlink base
  30. Number of characters (with spaces)
  31. Content type
  32. Content status
  33. Language
  34. Document version

The document properties are shared by all Office applications. However not all the above built-in properties are supported by all Office programs. For example the “25. Number Of Slides” property applies only to PowerPoint and doesn’t apply to Excel.

The properties which I frequently use with Excel are

Title
Subject
Keywords (Tags)
Comments
Revision number

So how do we get the list of all document properties using VB.Net and How do we set their values?

Get the list of all document properties of an Excel Document say SAMPLE.XLSX

After you have set the relevant references to excel, use the code given below. The code will type the name of all the properties in a sheet in the Excel File. I would recommend adding a temporary sheet in the file and use that. For the code below, I am assuming that the name of the temporary sheet is “Temp”

Imports Excel = Microsoft.Office.Interop.Excel
Imports Office = Microsoft.Office.Core

Public Class Form1
    '~~> Define your Excel Objects
    Dim xlApp As New Excel.Application
    Dim xlWorkBook As Excel.Workbook
    Dim xlSheet As Excel.Worksheet
    Dim DocProps As Object, DProps As Object

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        '~~> Opens an exisiting Workbook. Change path and filename as applicable
        xlWorkBook = xlApp.Workbooks.Open("C:\Sample.xlsx")
        xlSheet = xlWorkBook.Sheets("Temp")

        DocProps = xlWorkBook.BuiltinDocumentProperties

        '~~> Display Excel
        xlApp.Visible = False

        '~~> Loop via all properties
        If Not (DocProps Is Nothing) Then
            Dim i As Integer
            For i = 1 To DocProps.Count
                DProps = DocProps(i)
                xlSheet.Range("A" & i.ToString()).Value2 = DProps.Name
            Next i
        End If

        '~~> Save and Close the File
        xlWorkBook.Close(True)

        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

    '~~> Release the objects
    Private Sub releaseObject(ByVal obj As Object)
        Try
            System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
            obj = Nothing
        Catch ex As Exception
            obj = Nothing
        Finally
            GC.Collect()
        End Try
    End Sub

End Class

When you run the code you will get this kind of output

image

Setting the values of Properties

And here is an example where we will try and update few of the values.

    Dim xlApp As New Excel.Application
    Dim xlWorkBook As Excel.Workbook
    Dim xlSheet As Excel.Worksheet
    Dim DocProps As Object, DProps As Object

    '~~> Setting the Built in Document Properties
    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        'Dim prop As Object

        '~~> Opens an exisiting Workbook. Change path and filename as applicable
        xlWorkBook = xlApp.Workbooks.Open("E:\Users\Siddharth Rout\Desktop\Sample.xlsx")

        '~~> Display Excel
        xlApp.Visible = True

        DocProps = xlWorkBook.BuiltinDocumentProperties

        DocProps("Title").Value = "Summary Report"
        DocProps("Subject").Value = "Sales"
        DocProps("Keywords").Value = "Sales; Report; Summary"

        '~~> Close the File
        xlWorkBook.Close(True)

        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

    '~~> Release the objects
    Private Sub releaseObject(ByVal obj As Object)
        Try
            System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
            obj = Nothing
        Catch ex As Exception
            obj = Nothing
        Finally
            GC.Collect()
        End Try
    End Sub
End Class

And here is a snapshot of the properties

image

VB.Net and MS Word Poll


I have been contemplating on creating a separate tab for “VB.Net and MS Word”. Like the one I have for “VB.Net and MS Excel”.