PDA

View Full Version : Script to extract all text from MS WORD Documents into Excel



yashikor
04-21-2015, 04:04 PM
Total newbie here. I need to extract all text from MS Word documents into Excel. Ideally, I would like to populate the text into columns within Excel. Although this might be difficult. We have product specs that will eventually go into another system. If I can just get the text from each document, I can massage it later in Excel.


Many many thanks for your help.

yashikor
04-27-2015, 07:18 AM
Anyone?

gmayor
04-27-2015, 07:41 AM
You will have to explain the problem in more detail. What is in the documents; where in Excel do you want it to be inserted? Word and Excel are entirely different applications.

yashikor
04-27-2015, 08:57 AM
You will have to explain the problem in more detail. What is in the documents; where in Excel do you want it to be inserted? Word and Excel are entirely different applications.

Thank you for replying. I have attached an image of one of the word documents. Basically I want to extract all of the text from the word documents and populate an excel sheet. I don't need any special formatting, etc. Would be nice if I could put the text in specific columns, i.e., "Product Name" within a Product Name column.

The data in the Excel sheet will be dumped into another application down the road.

13263

gmayor
04-27-2015, 09:42 PM
It will be no mean feat to produce a process to extract that document into separate cells in Excel and I can see a few issues that might be difficult to resolve. It would also rely on all the documents to be processed being identically formatted. I doubt you will find anyone willing to put in that amount of work for free. The general principles involved are covered at - http://www.gmayor.com/extract_data_from_email.htm (e-mail messages are essentially documents as far as VBA is concerned). If you want to discuss it further, contact me via my web site.

yashikor
04-28-2015, 01:25 PM
Extracting data into separate cells in Excel is a wish list thing. If I can just grab the text and populate Excel, I can massage/format the data later. Getting the data from Word is the main thing.

gmayor
04-28-2015, 10:48 PM
Excel is a cell based application, so the whole point of putting the document into Excel would be to extract parts of the document to separate cells, otherwise it would still be a Word document embedded in Excel and thus you would have progressed no farther. As I indicated this is not a five minute job, and the process, similar to that highlighted in my earlier message, would have be be tailored to suit the document.

yashikor
04-29-2015, 07:03 AM
gmayor - thanks. I found the following script online with a little blurb. This would be perfect if it wrote everything into an Excel file. I wrote to the author but have not heard back.

The script below demonstrates how to extract all of the data from a Microsoft Word document. The script currently outputs all the data to a console window, though it can be easily modified to write the data to a file or database.


To run the script, save it to a file (e.g. word.vbs). Modify the line that sets the wordPath variable, and change it to specify the location of your Word file. For example, if your Word file was in the directory c:\StockData, and it's name was IBM.doc, you would change the line to look like this:
wordPath = "C:\StockData\IBM.doc"
You would then run the program from a DOS prompt (Start->Programs->Command Prompt) like this:
cscript word.vbs
Here is the script:
Option Explicit
REM We use "Option Explicit" to help us check for coding mistakes


REM the Word Application
Dim objWord

REM the path to the Word file
Dim wordPath

REM the document we are currently reading data from
Dim currentDocument
REM the number of Words in the current document
Dim numberOfWords
Dim i


REM where is the Word file located?
wordPath = "C:\Data\Doc1.doc"

WScript.Echo "Extract Data from " & wordPath

REM Create an invisible version of Microsoft Word
Set objWord = CreateObject("Word.Application")

REM don't display any messages about documents needing to be converted
REM from old Word file formats
objWord.DisplayAlerts = 0


REM open the Word document as read-only
REM open (path, confirmconversions, readonly
objWord.Documents.Open wordPath, false, true

REM Access the document
Set currentDocument = objWord.Documents(1)

REM How many words are in the document
NumberOfWords = currentDocument.Words.count
WScript.Echo "There are " & NumberOfWords & " words " & vbCRLF

For i = 1 to NumberOfWords
WScript.Echo currentDocument.Words(i)
Next

REM Close the document
currentDocument.Close
REM Free memory used to store the document object
Set currentDocument = Nothing

REM exit Microsoft Word
objWord.Quit
Set objWord = Nothin