PDA

View Full Version : Extracting data from PDF to Excel



DLAUNA
06-04-2017, 11:50 PM
Hi,

Am in charge of monitoring production from one of the oil fields my company is involved with. The daily production numbers are supplied via PDF and each day I have a technical assistant that populates a spreadsheet which I use for data analysis and management reporting. The task is tedious and times errors come about in results due to data entry errors. I have read some texts on automating the process to extract necessary data from PDF to excel direct, this will assist in reducing time and also manual entry errors. Would appreciate some templates or coding that can pull the specific data I require to the specific cells in excel to streamline this process.

mdmackillop
06-05-2017, 02:30 AM
I never found a program that did this well unless the layout was very simple. I used the Adobe online at £12.70/year. Best results were usually obtained by converting to Word and importing tabular areas to Excel for further processing.
Can you post a sample PDF?

buffie
06-05-2017, 04:30 AM
I have had great luck with PDF2XL.

Bob Phillips
06-05-2017, 04:44 AM
I have had great luck with PDF2XL.

As have I.

DLAUNA
06-05-2017, 03:29 PM
I never found a program that did this well unless the layout was very simple. I used the Adobe online at £12.70/year. Best results were usually obtained by converting to Word and importing tabular areas to Excel for further processing.
Can you post a sample PDF?

Hi there, please find attached an example of the PDF. Am reading that converting to text file then importing is also an added option. Appreciate your feedback

ashleyuk1984
06-05-2017, 04:10 PM
I've had to do a similar task. What data from the PDF are you trying to extract?

My PDFs were very similar each time I received one. The main things in common were the layout, and the standard text.
So my method was to use the "text to columns" feature, then search for the standard text, and then take the value that was one cell to the right of it - or where ever else it was.

DLAUNA
06-05-2017, 04:17 PM
Hi there,

Thanks for the feedback, the pdf's I receive are of consistent formatting with text to the left and values to the right. It is complicated (to me) in some areas as some columns have both the text to the left and value to the right. Have attached an example pdf this morning per post at 829am.

SamT
06-05-2017, 06:02 PM
It was simple for me to open the PDF, Ctrl+A > Ctrl+C
Open Notepad > Ctrl+V > Ctrl+A > Ctrl+C
Open Excel Select Cell A1 > Ctrl+V or Right Click + Paste

That produced a completely imported data set that can be handled with Excel's Find(Keyword) > Parse found cell for required data.

I doubt the importing process's ability to always present the "Texted" PDF in the same format, therefore I suggest the Find & Parse cell routine.

It would require your assistant to spend almost a complete minute, just clicking the mouse and keyboard.

Try the above on a few reports from various offices to determine the consistency of the PDF generators





Of course, if you could convince the head office to switch from the old, dated, inefficient PDF format to the new, exciting, universal XML format, you can directly import into Excel just the data you needed from the Word, Project, DataBase, or most any current gen application generated XML document.

:ack:

mdmackillop
06-06-2017, 01:30 AM
Files in 3 Adobe converted formats for your consideration

DLAUNA
06-06-2017, 03:33 PM
Hi, thanks for the attachments.
After reviewing the comments in response to my post, the conclusion I can draw is that the easiest and best approach is to:
1) Have the Tech Assist Convert the PDF's to Word Doc and store in a folder.
2) Generate some form of code on the excel database (currently In use) to pull the necessary data from the specific word doc report /ate to the excel sheet to aligned cells.

Derek