PDA

View Full Version : Word to Excel



Rynman
03-24-2009, 09:50 PM
Hi guys

I have thousands of Word documents. Basically what I want, is something that can copy all of the data contained in each document, into it's own Excel cell vertically. So in other words, each Word document has it's own cell in Excel with everything contained in it. I'm guessing this can be done using VBA but not sure how.

Thanks for any help you can lend!

geekgirlau
03-24-2009, 10:35 PM
Welcome to the Board!

I'm not sure exactly why you want this. Excel has a limit as to the number of characters you can have in one cell, and you are going to end up with a very large, very messy spreadsheet.

Why do you want to do this?

Rynman
03-25-2009, 01:29 AM
Welcome to the Board!

I'm not sure exactly why you want this. Excel has a limit as to the number of characters you can have in one cell, and you are going to end up with a very large, very messy spreadsheet.

Why do you want to do this?

Well, I'm trying to pull some data from each word document. Problem is that they are not 100% consistent in their formatting. I figured it would be easier to find out how to get that data into a cell then go from there. I didn't realise there was a character limit per cell (makes sense). Does anyone know how I could automate a process to go through each Word document, and create a new cell (on each row) containing the piece/s of data I'm after?

fumei
03-25-2009, 01:18 PM
You need to work out very precisely what you want to do...first.

1. are you executing this from Word, or from Excel? Makes a big difference.

2. be consistent.

"copy all of the data contained in each document"

is VERY different from:

"I'm trying to pull some data from each word document"

Which is it? All of it? Some of it?

3. HOW is the data in the Word docs set up? In bookmarks? This would be best as it makes it MUCH easier to get it.

Is the data going to be - for example - just the contents of one cell in a table? The same cell in, say, Table 1, in each document?

Is it always going to be Paragraph(3)? Do you need to figure out some logic to determine what data to get from each Word doc?

" Problem is that they are not 100% consistent in their formatting. "

What is the significance of this? Are you trying to get around that by dumping stuff into cells in Excel? This seems excessive. What - EXACTLY - are you wanting to do, and WHY?

In any case, here is a VERY simple demonstration. VERY simple. I will set it up for you.

I have four documents (the names are irrelevant for the purpose of this demo), in a folder. C:\ZZZ\NewDoc

Each document has a single paragraph.

Doc1 = "This is doc 1."
Doc2 = "This is doc 2."
Doc3 = "This is doc 3."
Doc4 = "This is doc 4."

Just to keep it simple - and because I do not know how your documents are structured, nor what data you want to get out - the following code simply grabs the entire contents of each document. So, "This is doc 1.", "This is doc 2.", "This is doc 3." etc.

OK? So the following code running from Word:

1. declares a CONSTANT (myPath) for the path to be used

2. declares Excel variables. NOTE: this uses early binding!

3. creates the instance of Excel

4. opens an existing (blank in this case) Excel file (DumpHere.xls) and sets workbook and worksheet objects

5. uses the Dir function in Word to open EACH document in the path constant

6. pass the document contents ("This is doc 1." etc) to a string variable (strWhatever)

7. worksheet object takes the string and puts it into a cell. This starts with Cell(j, 1) - Row 1, Column 1 as j = 1

8. code closes the current Word doc

9. increments the counter used for the row in Excel (j = j + 1)

10. loops to the next Word doc and repeats 6 to 9.

That is, the first Word doc string goes into Cell(1,1) in Excel, the next Word doc goes into Cell(2,1), the next into Cell(3,1).

The Excel users and gurus will probably have a much better way to use Excel - it is not my forte. But the code below will end up with:

This is doc 1
This is doc 2
This is doc 3
This is doc 4

going down the rows in the Excel file.

Hopefully this can give you a start. But you need to define precisely what you want to do.

Option Explicit

Sub DumpToExcel()
Const myPath As String = "c:\zzz\NewDoc\"

Dim xlApp As Excel.Application
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet

Set xlApp = CreateObject("Excel.Application")
Set wb = xlApp.Workbooks.Open(FileName:=myPath & _
"DumpHere.xls")
Set ws = wb.Worksheets(1)

Dim file
Dim strWhatever As String
Dim j As Long
j = 1
file = Dir(myPath & "*.doc")
Do While file <> ""
Documents.Open FileName:=myPath & file
strWhatever = ActiveDocument.Range.Text
ws.Cells(j, 1) = strWhatever
ActiveDocument.Close wdDoNotSaveChanges
j = j + 1
file = Dir
Loop
' just to be able to see the results in Excel
xlApp.Visible = True
Set ws = Nothing
Set wb = Nothing
xlApp.Quit
Set xlApp = Nothing
End Sub

Do NOT forget to explicitly Quit and destroy (set to Nothing) your Excel instance!

Again, if you are going to run this from Excel, things would be coded differently. In that case, you would need an instance of Word, and you would access the files a little differently. You may end up using FileSystemObject, rather than Dir.

Unfortunately - although you do not mention versions (BTW: a good idea please) - if you are using 2007, you can NOT use Application.FileSearch. FileSearch could be very handy if you need to do some logic filtering of your 1000s of Word docs!