Fill/Retrieve data from PDF Form Fields using VB.Net From an Excel File


EDIT

I have stopped updating this blog. This link can also be found in my Website.

http://www.siddharthrout.com/2012/01/28/fillretrieve-data-from-pdf-form-fields-using-vb-net-from-an-excel-file/

 

  • Tired of filling a PDF report every day? OR
  • Tired of collecting information from a pdf everyday?

Here are ways to ease your pain Smile

We will separate the article in 3 sections

  1. Filling PDF Form Fields
  2. Extracting the PDF Form Field names
  3. Retrieving data from PDF Form Fields

Filling PDF Form Fields


Imagine if you need to fill a Daily Report and mail it to your boss over the email every day? In the beginning, it might sound adventurous but then you start hating the monotonous routine.

Here is one quick way to fill the Form Fields of PDF using VB.Net from data stored in Excel.

To begin with let’s design our form. Place 2 textboxes, 4 command buttons and one OpenFileDialogon a form. Once done the form will look similar to what is show below.

image

Now let’s say you have a PDF which looks like below. I created this PDF just to demonstrate on how to fill the PDF.

image

Once the data is filled, your PDF will look like this

image

Next is preparing your Excel Database.

Open Excel and Type the Data as shown below. Once done, save it as C:\PDF_FORM_DATA.XLSX.

This is how your Excel file might look.

image

We are now all set to write the code in VB.net

Double click on the “Browse” Button which we will be use to select the PDF file and paste this code.

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        With OpenFileDialog1
            .DefaultExt = ".PDF"
            .DereferenceLinks = True
            .Filter = _
             "PDF files (*.PDF)|*.PDF"
            .Multiselect = False
            .Title = "Select a PDF file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox1.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

Next Double click on the “Browse” Button which we will be use to select the Excel file and paste this code.

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        With OpenFileDialog1
            .DefaultExt = ".XLSX"
            .DereferenceLinks = True
            .Filter = _
             "Excel files (*.XLSX)|*.XLSX"
            .Multiselect = False
            .Title = "Select an Excel file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox2.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

Now Double click on the “Cancel” Button and paste this code.

    Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
        Me.Close()
    End Sub

Finally Double click on the “Update” Button and paste this code.

    Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
        Dim pdfTemplate As String = TextBox1.Text
        '~~> Change the Output FileName here
        Dim PDFUpdatedFile As String = "C:\PDF_FORM_DATA (UPDATED).pdf"

        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim stamperPDF As New PdfStamper(readerPDF, _
        New FileStream(PDFUpdatedFile, FileMode.Create))

        Dim pdfFormFields As AcroFields = stamperPDF.AcroFields

        '~~> Define your Excel Objects
        Dim xlApp As New Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet

        '~~> Show/Hide Excel
        xlApp.Visible = True

        '~~> Opens an exisiting Workbook
        xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

        '~~> Set the relevant sheet that we want to work with
        xlWorkSheet = xlWorkBook.Sheets("Sheet1")

        '~~> Update pdf FormFields
        pdfFormFields.SetField(xlWorkSheet.Range("A2").Value.ToString, xlWorkSheet.Range("B2").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A3").Value.ToString, xlWorkSheet.Range("B3").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A4").Value.ToString, xlWorkSheet.Range("B4").Value.ToString)

        '~~> To remove editting options from the output Form, set it to FALSE
        '~~> To leave then editting options open in the output Form, set it to TRUE
        stamperPDF.FormFlattening = True

        '~~> close the pdf
        stamperPDF.Close()

        '~~> Close the Excel file without saving
        xlWorkBook.Close(False)
        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

Add the below code at the very top.

Imports iTextSharp
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.text.xml
Imports System.IO
Imports Excel = Microsoft.Office.Interop.Excel

and add one Sub at the bottom for clean up.

    '~~> Release the objects
    Private Sub releaseObject(ByVal obj As Object)
        Try
            System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
            obj = Nothing
        Catch ex As Exception
            obj = Nothing
        Finally
            GC.Collect()
        End Try
    End Sub

The last thing that we need to do is setting References for Excel and itextsharp.dll

itextsharp.dll is freely available on the web and it is free to use for Non Commercial Applications. Once you have downloaded the file, save it at a location of your choice. Once downloaded, click on the menu PROJECT | Add Reference. Navigate to the “Browse” tab and select the DLL and click on OK.

image

Next set a reference to the Excel Object Library. See this link for more information.

Your Final code will look like this. Now run the code and select the relevant files. Once done click on update. The new file with the updated Data will be saved in C:\ as PDF_FORM_DATA (UPDATED).PDF.

Info You can change the file name and path of the updated file in the code itself.

FINAL CODE:

Imports iTextSharp
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.text.xml
Imports System.IO
Imports Excel = Microsoft.Office.Interop.Excel

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        With OpenFileDialog1
            .DefaultExt = ".PDF"
            .DereferenceLinks = True
            .Filter = _
             "PDF files (*.PDF)|*.PDF"
            .Multiselect = False
            .Title = "Select a PDF file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox1.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        With OpenFileDialog1
            .DefaultExt = ".XLSX"
            .DereferenceLinks = True
            .Filter = _
             "Excel files (*.XLSX)|*.XLSX"
            .Multiselect = False
            .Title = "Select an Excel file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox2.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

    Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
        Me.Close()
    End Sub

    Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
        Dim pdfTemplate As String = TextBox1.Text
        '~~> Change the Output FileName here
        Dim PDFUpdatedFile As String = "C:\PDF_FORM_DATA (UPDATED).pdf"

        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim stamperPDF As New PdfStamper(readerPDF, _
        New FileStream(PDFUpdatedFile, FileMode.Create))

        Dim pdfFormFields As AcroFields = stamperPDF.AcroFields

        '~~> Define your Excel Objects
        Dim xlApp As New Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet

        '~~> Show/Hide Excel
        xlApp.Visible = True

        '~~> Opens an exisiting Workbook
        xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

        '~~> Set the relevant sheet that we want to work with
        xlWorkSheet = xlWorkBook.Sheets("Sheet1")

        '~~> Update pdf FormFields
        pdfFormFields.SetField(xlWorkSheet.Range("A2").Value.ToString, xlWorkSheet.Range("B2").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A3").Value.ToString, xlWorkSheet.Range("B3").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A4").Value.ToString, xlWorkSheet.Range("B4").Value.ToString)

        '~~> To remove editting options from the output Form, set it to FALSE
        '~~> To leave then editting options open in the output Form, set it to TRUE
        stamperPDF.FormFlattening = True

        '~~> close the pdf
        stamperPDF.Close()

        '~~> Close the Excel file without saving
        xlWorkBook.Close(False)
        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

    '~~> Release the objects
    Private Sub releaseObject(ByVal obj As Object)
        Try
            System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
            obj = Nothing
        Catch ex As Exception
            obj = Nothing
        Finally
            GC.Collect()
        End Try
    End Sub
End Class

Extracting PDF Form Field names


Now Let’s assume that you don’t know what the form fields are. In such a case you can extract the field names from the PDF using the below code.

Insert one more textbox in the above form and also add one more command button. Set the multiline property of the textbox to TRUE.

Your form now should look like this

image

Double click on the “Get Form Fields” Button and paste this code.

    Private Sub Button5_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button5.Click
        Dim pdfTemplate As String = TextBox1.Text
        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim PDFfld As Object

        For Each PDFfld In readerPDF.AcroFields.Fields
            If TextBox3.Text = "" Then
                TextBox3.Text = PDFfld.key.ToString()
            Else
                TextBox3.Text = TextBox3.Text & Environment.NewLine & PDFfld.key.ToString()
            End If
        Next

        TextBox3.SelectionStart = 0
    End Sub

Now when you select the pdf file and click on the “Get Form Fields” button, you will notice the textbox populates with the Form Field names.

image


Retrieving data from PDF Form Fields


Now let’s add one more button to the above form and call it “Get Form Data”. Your form should look like this.

image

Double click on the “Get Form Data” Button and paste this code. I will be using the same Excel file that we created above to output the data. I would output the data in Col C.

    Private Sub Button6_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button6.Click
        Dim pdfTemplate As String = TextBox1.Text
        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim pdfFormFields As AcroFields = readerPDF.AcroFields

        '~~> Define your Excel Objects
        Dim xlApp As New Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet
        Dim LastRow As Integer, i As Integer

        '~~> Show/Hide Excel
        xlApp.Visible = True

        '~~> Opens an exisiting Workbook
        xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

        '~~> Set the relevant sheet that we want to work with
        xlWorkSheet = xlWorkBook.Sheets("Sheet1")

        '~~> Get the last row in Col A which has Form Fields
        LastRow = xlWorkSheet.Range("A" & xlApp.Rows.Count).End(Excel.XlDirection.xlUp).Row

        '~~> Loop through Col A and use the Form Fields to extract text
        For i = 2 To LastRow
            xlWorkSheet.Range("C" & i).Value = pdfFormFields.GetField(xlWorkSheet.Range("A" & i).Value)
        Next

        '~~> Close the Excel file without saving
        xlWorkBook.Close(True)
        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

This is how the Excel file looks after you run the above code.

image


Advertisements

8 responses to “Fill/Retrieve data from PDF Form Fields using VB.Net From an Excel File

  1. Eddy Jawed February 27, 2012 at 6:02 am

    Hi Siddhart,

    Thanks for this helpful tutorial. Just wondering if you got the dll and the sample pdf form to download too?

  2. Rathi March 1, 2012 at 4:26 pm

    hi Siddharth.. This is very Helpful.. Had a query.. The form fields given in excel are related to form fields of PDF Labels or propetry names of control in PDF..?? Please help!!!

  3. Gloria Antonakos March 4, 2012 at 11:54 am

    Please message me with some pointers on how you made this website look like this, Id appreciate it.

  4. Rathi March 8, 2012 at 3:59 pm

    Property Names of Label in PDF in the sense For Example if i have – 1. Name (as Label) and Text box besides it to fill, here should i consider only Name as property name of Label in my excel sheet or 1.Name?

  5. Rathi March 8, 2012 at 4:36 pm

    After the conversion, i always get the error when opening the pdf that something like “This is document is changed as compared to original document”. Is this due to any security in the form ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: