PDA

View Full Version : [SOLVED:] Transfer text from a textbox in Word Header to Excel



lentilbake
10-07-2013, 11:49 AM
Hi everyone, I have a little task that I'd like some help with (a lot really!). I have a load of similarly formatted Word files which all have a single text box in the Header on the first page. I'd like to be able to open each of the files, read the text from the text box and write it to the spreadsheet row by row. It would also be nice if I could search for some specific text in each document, read a paragraph or two from this point and copy that to the spreadsheet too (in the next column). I've found code which reads between bookmarks (fumei-very good) but bookmarking the Word documents will take an age!

Any help very much appreciated.

macropod
10-10-2013, 02:51 AM
You could try something like the following Excel macro:

Sub GetDocumentHeaders()
'Note: A Reference to the Word Object Library is required.
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, xlSht As Worksheet, lRow As Long
Dim wdApp As New Word.Application
Dim wdDoc As Word.Document
strFolder = GetFolder
If strFolder = "" Then Exit Sub
Set xlSht = ActiveSheet
lRow = xlSht.Cells.SpecialCells(xlCellTypeLastCell).Row + 1
strFile = Dir(strFolder & "\*.doc", vbNormal)
While strFile <> ""
Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, _
AddToRecentFiles:=False, Visible:=False)
With wdDoc
xlSht.Range("A" & lRow).Value = .Name
With .Sections.First.Headers(wdHeaderFooterPrimary)
If .Shapes.Count <> 0 Then
If Not .Shapes(1).TextFrame Is Nothing Then
xlSht.Range("B" & lRow).Value = .Shapes(1).TextFrame.TextRange.Text
End If
End If
End With
lRow = lRow + 1
.Close SaveChanges:=False
End With
strFile = Dir()
Wend
wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing
Application.ScreenUpdating = True
End Sub
'
Function GetFolder() As String
Dim oFolder As Object
GetFolder = ""
Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
Set oFolder = Nothing
End Function

Filenames are output in column A, & header textbox contents in column B

lentilbake
10-13-2013, 09:27 AM
Many thanks for your time and trouble in replying, I'll try this out as soon as I can, it seems to be exactly what I want.
If I could prevail even further - how could one read a paragraph or line from the doc into a string?

Thanks again.

macropod
10-13-2013, 01:56 PM
For that you might use something like:

With .Sections.First.Range.Paragraphs.First
xlSht.Range("C" & lRow).Value = .Text
End With
before:
lRow = lRow + 1
Of course, if you don't know exactly where is the document the text is, you'll need some way of finding it ...

lentilbake
10-14-2013, 08:58 AM
Thanks again Paul for your help on this.

When I run the code I get the first file name read into A3 and then an error from VB:

Run-time error '5917'
This object does not support attached text

at this line:
xlSht.Range("B" & lRow).Value = .Shapes(1).TextFrame.TextRange.Text

I'm using Excel / Word 2007, the Word files are standard .doc

Sorry, I'm just not clever enough know how to fix this!

I think it's not a textbox but is an autoshape (sorry to mislead you), will try and sort this myself but any help appreciated!

Tried to attach a file for you to look at but I'm not allowed!

lentilbake
10-14-2013, 11:42 AM
Obviously something funny with the Doc files, I created a couple of samples from scratch with autoshapes in the header and your code works a treat.

I just copied the text box to the header in a new document, saved it as a Doc and again the code works fine!

It s beyond me!

macropod
10-14-2013, 02:16 PM
If you upload a sample problem document to one of the free file-hosting sites (eg http://www.4shared.com/) and paste a link here, I could take a look at it.

lentilbake
10-14-2013, 11:01 PM
Ok Paul, links in the following post.

lentilbake
10-14-2013, 11:02 PM
Thanks Paul, try these:
https://dl.dropboxusercontent.com/u/65709283/b1_g_01_technician.doc
https://dl.dropboxusercontent.com/u/65709283/b1_g_02_technician.doc

They all look like this.

macropod
10-15-2013, 12:09 AM
OK, there are a couple of problems.

First, the documents you posted have multiple shapes in the page header (eg 10 shapes in the 01 file).
Second, the If test I used in the code to validate the presence of some text doesn't work.

To deal with both issues, add:
Dim wdShp As Word.Shape
after:
Dim wdDoc As Word.Document
and change:

With .Sections.First.Headers(wdHeaderFooterPrimary)
If .Shapes.Count <> 0 Then
If Not .Shapes(1).TextFrame Is Nothing Then
xlSht.Range("B" & lRow).Value = .Shapes(1).TextFrame.TextRange.Text
End If
End If
End With
to:

With .Sections.First.Headers(wdHeaderFooterPrimary)
For Each wdShp In .Shapes
With wdShp.TextFrame
If .HasText = True Then
xlSht.Range("B" & lRow).Value = .TextRange.Text
Exit For
End If
End With
Next
End With
The above changes will cause the code to return the text from the first shape in the header that has any. With so many shapes, that may or may not be the one you want.

You may want to check out what's in some of the problem files with:

Sub GetHeaderShapes()
Dim Shp As Shape, StrTmp As String
With ActiveDocument.Sections.First.Headers(wdHeaderFooterPrimary)
For Each Shp In .Shapes
With Shp
StrTmp = "Name: " & .Name & vbCr & _
"Top: " & .Top & vbCr & _
"Left: " & .Left & vbCr & _
"Height: " & .Height & vbCr & _
"Width: " & .Width & vbCr
If .TextFrame.HasText = True Then
StrTmp = StrTmp & "Text: " & .TextFrame.TextRange.Text
End If
MsgBox StrTmp
End With
Next
End With
End Sub

lentilbake
10-15-2013, 03:05 AM
Thankyou!
Your amended code worked perfectly and I'm hoping the files are all similar enough for it to keep on working. Still can't work out how there can be so many shapes in the header, can't see them in Word? I will have a go with your GetHeaderShapes sub to check some of them out. If I can just work out how to read the required text in the main body I'll be quids in (I think this might be too tricky as the there are lots of variations in the files so might not bother with this).

Again, Many thanks Paul, really appreciate it.

lentilbake
10-15-2013, 04:00 AM
Tried your addition to read text and I get a compile error 'Method or data member not found' at this line:

xlSht.Range("C" & lRow).Value = .Text

(with the .Text highlighted)

looks ok to me?

macropod
10-15-2013, 03:16 PM
Try:

With .Sections.First.Range.Paragraphs.First.Range
xlSht.Range("C" & lRow).Value = .Text
End With

lentilbake
10-16-2013, 11:04 AM
Thanks, that worked!

I worked out (all on my own!) that you can use .paragraph(index) to read subsequent paragraphs so I just need to work out the text I need to grab I'm done!

Thanks again for your time on this, I used to program many years ago in Fortran and Cobol etc but I'm finding all the methods, properties, objects etc a lot to get my head round!