Excel

get the data from PDF file into Excel sheet(s) or text file(s)

Ease of Use

Easy

Version tested with

2003 

Submitted by:

mohanvijay

Description:

It will get the data from PDF file into Excel Sheet or Text file 

Discussion:

I assigned to get the phone numbers and charges and etc.. of more than 200 employess whose data are avaialble in PDF file so i write code to get data from PDf file into array and do some if conditions to get exact data that i want and put them into excel. The attached file get all data from PDF file. You can add some if conditions depends upon your need. You need Adobe Acrobat installed to use this code. I created and used in Adobe Acrobat professional version 7.0 

Code:

instructions for use

			

'Contains One form and Module 'code in form '***************************************************************************************** '***************************************************************************************** Option Explicit Private Sub cmd_imp_Click() 'check import option If opt_xl.Value = False And opt_txt.Value = False Then MsgBox "Please select one of the import mode" Exit Sub End If Dim OS_FSO As Object Set OS_FSO = CreateObject("Scripting.filesystemobject") Dim PDF_Path As String, Txt_Fol As String PDF_Path = txt_pdf.Text 'check the PDF file exists If OS_FSO.fileexists(PDF_Path) = False Then MsgBox "PDF file not found" Set OS_FSO = Nothing Exit Sub End If If opt_txt.Value = True Then Txt_Fol = txt_txt.Text 'check the folder for text file if import PDF data into text file If OS_FSO.folderexists(Txt_Fol) = False Then MsgBox "Folder '" & Txt_Fol & "' not exist please select valid folder" Set OS_FSO = Nothing Exit Sub End If 'import into text files Call Imp_Into_Txt(PDF_Path, Txt_Fol, chk_txt.Value) End If If opt_xl.Value = True Then 'import into text files Call Imp_Into_XL(PDF_Path, chk_xl.Value) End If End Sub Private Sub cmd_pdf_Click() Dim Dlg_File As FileDialog Set Dlg_File = Application.FileDialog(msoFileDialogFilePicker) txt_pdf.Text = "" With Dlg_File .Filters.Add "PDF Files", "*.pdf" If .Show = -1 Then txt_pdf.Text = .SelectedItems(1) End If End With Set Dlg_File = Nothing End Sub Private Sub cmd_txt_Click() 'get the folder for save text file(s) Dim Dlg_Fol As FileDialog Set Dlg_Fol = Application.FileDialog(msoFileDialogFolderPicker) txt_txt.Text = "" If Dlg_Fol.Show = -1 Then txt_txt.Text = Dlg_Fol.SelectedItems(1) End If Set Dlg_Fol = Nothing End Sub Private Sub opt_txt_Click() Call Con_Txt(True) End Sub Private Sub opt_xl_Click() Call Con_Txt(False) End Sub Private Sub Con_Txt(Ena As Boolean) 'set the intial value txt_txt.Enabled = Ena cmd_txt.Enabled = Ena chk_txt.Enabled = Ena chk_xl.Enabled = Not Ena End Sub '***************************************************************************************** '***************************************************************************************** 'code in module Option Explicit Sub Main_Import() frm_pdfimp.Show End Sub Sub Imp_Into_XL(PDF_File As String, Each_Sheet As Boolean) 'This procedure get the PDF data into excel by following way '1.Open PDF file '2.Looping through pages '3.get the each PDF page data into individual _ sheets Or single sheet As defined In Each_Sheet Parameter Dim AC_PD As Acrobat.AcroPDDoc 'access pdf file Dim AC_Hi As Acrobat.AcroHiliteList 'set selection word count Dim AC_PG As Acrobat.AcroPDPage 'get the particular page Dim AC_PGTxt As Acrobat.AcroPDTextSelect 'get the text of selection area Dim WS_PDF As Worksheet Dim RW_Ct As Long 'row count Dim Col_Num As Integer 'column count Dim Li_Row As Long 'Maximum rows limit for one column Dim Yes_Fir As Boolean 'to identify beginning of page Li_Row = Rows.Count Dim Ct_Page As Long 'count pages in pdf file Dim i As Long, j As Long, k As Long 'looping variables Dim T_Str As String Dim Hld_Txt As Variant 'get PDF total text into array RW_Ct = 0 'set the intial value Col_Num = 1 'set the intial value Application.ScreenUpdating = False Set AC_PD = New Acrobat.AcroPDDoc Set AC_Hi = New Acrobat.AcroHiliteList 'set maximum selection area of PDF page AC_Hi.Add 0, 32767 With AC_PD 'open PDF file .Open PDF_File 'get the number of pages of PDF file Ct_Page = .GetNumPages 'if get pages is failed exit sub If Ct_Page = -1 Then MsgBox "Pages Cannot determine in PDF file '" & PDF_File & "'" .Close GoTo h_end End If 'add sheet only one time if Data retrive in one sheet If Each_Sheet = False Then Set WS_PDF = Worksheets.Add(, Worksheets(Sheets.Count)) WS_PDF.Name = "PDF2Text" End If 'looping through sheets For i = 1 To Ct_Page T_Str = "" 'get the page Set AC_PG = .AcquirePage(i - 1) 'get the full page selection Set AC_PGTxt = AC_PG.CreateWordHilite(AC_Hi) 'if text selected successfully get the all the text into T_Str string If Not AC_PGTxt Is Nothing Then With AC_PGTxt For j = 0 To .GetNumText - 1 T_Str = T_Str & .GetText(j) Next j End With End If If Each_Sheet = True Then 'add each sheet for each page Set WS_PDF = Worksheets.Add(, Worksheets(Sheets.Count)) End If 'transfer PDF data into sheet With WS_PDF If Each_Sheet = True Then .Name = "Page-" & i 'get the PDF data into each sheet for each PDF page 'if text accessed successfully then split T_Str by VbCrLf 'and get into array Hld_Txt and looping through array and fill sheet with PDF data If T_Str <> "" Then Hld_Txt = Split(T_Str, vbCrLf) For k = 0 To UBound(Hld_Txt) T_Str = CStr(Hld_Txt(k)) If Left(T_Str, 1) = "=" Then T_Str = "'" & T_Str .Cells(k + 1, 1).Value = T_Str Next k Else 'information if text not retrive from PDF page .Cells(1, 1).Value = "No text found in page " & i End If Else 'get the pdf data into single sheet If T_Str <> "" Then Hld_Txt = Split(T_Str, vbCrLf) Yes_Fir = True For k = 0 To UBound(Hld_Txt) RW_Ct = RW_Ct + 1 'check begining of page if yes enter PDF page number for any idenfication If Yes_Fir Then RW_Ct = RW_Ct + 1 .Cells(RW_Ct, Col_Num).Value = "Text In Page - " & i RW_Ct = RW_Ct + 2 Yes_Fir = False End If 'check for maximum rows if exceeds start from next column If RW_Ct > Li_Row Then RW_Ct = 1 Col_Num = Col_Num + 1 End If T_Str = CStr(Hld_Txt(k)) If Left(T_Str, 1) = "=" Then T_Str = "'" & T_Str .Cells(RW_Ct, Col_Num).Value = T_Str Next k Else RW_Ct = RW_Ct + 1 .Cells(RW_Ct, Col_Num).Value = "No text found in page " & i RW_Ct = RW_Ct + 1 End If End If End With Next i .Close End With Application.ScreenUpdating = True MsgBox "Imported" h_end: Set WS_PDF = Nothing Set AC_PGTxt = Nothing Set AC_PG = Nothing Set AC_Hi = Nothing Set AC_PD = Nothing End Sub Sub Imp_Into_Txt(T_PDF_File As String, Fol_Path As String, Each_Page As Boolean) 'same as above procedure instead of sheets use text files Dim AC_PD As Acrobat.AcroPDDoc Dim AC_Hi As Acrobat.AcroHiliteList Dim AC_PG As Acrobat.AcroPDPage Dim AC_PGTxt As Acrobat.AcroPDTextSelect Dim OS_FSO As Object Dim OS_TxtFile As Object Set OS_FSO = CreateObject("Scripting.filesystemobject") Dim Ct_Page As Long Dim i As Long, j As Long, k As Long Dim T_Str As String Dim Hld_Txt As Variant Set AC_PD = New Acrobat.AcroPDDoc Set AC_Hi = New Acrobat.AcroHiliteList AC_Hi.Add 0, 32767 With AC_PD .Open T_PDF_File Ct_Page = .GetNumPages If Ct_Page = -1 Then MsgBox "Pages Cannot determine in PDF file '" & T_PDF_File & "'" .Close GoTo h_end End If If Each_Page = False Then Set OS_TxtFile = OS_FSO.createtextfile(Fol_Path & "\pdf2text.txt") End If For i = 1 To Ct_Page T_Str = "" Set AC_PG = .AcquirePage(i - 1) Set AC_PGTxt = AC_PG.CreateWordHilite(AC_Hi) If Not AC_PGTxt Is Nothing Then With AC_PGTxt For j = 0 To .GetNumText - 1 T_Str = T_Str & .GetText(j) Next j End With End If If T_Str = "" Then T_Str = "No text found in page " & i If Each_Page = True Then Set OS_TxtFile = OS_FSO.createtextfile(Fol_Path & "\Page-" & i & ".txt") OS_TxtFile.write T_Str OS_TxtFile.Close Set OS_TxtFile = Nothing Else T_Str = vbCrLf & vbCrLf & "Text In Page - " & i & vbCrLf & vbCrLf & T_Str OS_TxtFile.write T_Str End If Next i If Each_Page = False Then OS_TxtFile.Close .Close End With MsgBox "Imported" h_end: Set OS_TxtFile = Nothing Set OS_FSO = Nothing Set AC_PGTxt = Nothing Set AC_PG = Nothing Set AC_Hi = Nothing Set AC_PD = Nothing End Sub

How to use:

  1. you need to Adobe Acrobat installed to run this code
  2. open attached excel file
  3. click Import PDF button then 'PDF Import' form will show
  4. select PDF file import
  5. select option to import into Excel or Text file
  6. tick checkbox if you want each page in each excel sheet
  7. if you select text file option then select folder location save text files
  8. click Import button
  9. if you select text file option then text files saved into the selected folder
 

Test the code:

  1. You need to Adobe Acrobat installed to run this code
  2. Open attached excel file click Import PDF button
 

Sample File:

PDF2XL.zip 24.22KB 

Approved by Jacob Hilderbrand


This entry has been viewed 786 times.

Please read our Legal Information and Privacy Policy
Copyright @2004 - 2014 VBA Express