PDA

View Full Version : [SOLVED:] VBA - find text in a word doc headers and return all page numbers



jalal
04-03-2021, 07:52 AM
Hi All,

I joined today at VBA Express. I am very optimistic for the solution from here.

I have a word doc which i want to open and find specific text in headers, and return all page numbers where that text is found or extract all those found pages in a separate word doc and save that as PDF.

Below is the VBA code which work well to find the text in the body of the word doc but no luck in find the text in headers.

Sub OpenWordDoc_and_GetNumber_and_save_pdf()
Dim wordapp As Word.Application
Dim findRange As Excel.Range
Dim findCell As Excel.Range
Dim rngFound As Word.Range
Dim saving_path As String


Set wordapp = CreateObject("word.Application")
wordapp.Visible = True
wordapp.Activate
wordapp.Documents.Open "A:\CITISERVICE\Attachments\Mibor\2021\Mar\22\D-Sql_MIBOR-CDT-ADVICE-O.docx" 'filename.docx"
saving_path = "A:\CITISERVICE\Attachments\Mibor\2021\Mar\22"

Set rngFound = wordapp.ActiveDocument.Range

With rngFound.Find
.Text = "Citibank"
.Execute
End With
If rngFound.Find.Found Then
page_num = rngFound.Information(wdActiveEndPageNumber)

wordapp.ActiveDocument.ExportAsFixedFormat OutputFileName:= _
saving_path & "" & "Test" & ".pdf", ExportFormat:=wdExportFormatPDF, _
OpenAfterExport:=False, OptimizeFor:=wdExportOptimizeForPrint, Range:= _
wdExportFromTo, From:=page_num, To:=page_num, Item:=wdExportDocumentContent, _
IncludeDocProps:=False, KeepIRM:=False, CreateBookmarks:= _
wdExportCreateHeadingBookmarks, DocStructureTags:=True, _
BitmapMissingFonts:=False, UseISO19005_1:=False

Else
MsgBox ("Text Not Found")
Exit Sub
End If



'Next findCell
wordapp.Quit


Set rngFound = Nothing
Set findCell = Nothing
Set findRange = Nothing
Set wordapp = Nothing

End Sub

macropod
04-03-2021, 03:16 PM
Headers in Word are unrelated to page numbers; they are objects in their own right. Rather, Headers (and Footers) are related to Sections and each Section has three potential Header/Footer types (Primary, Even and First Page). And, if that's not enough, Headers & Footers in successive Sections can have the 'same as previous attribute' applied, so that they inherit the attributes of the previous Section rather than containing that content in their own right.

jalal
04-03-2021, 09:48 PM
So how can I solve the issue. My end goal is to extract that page wherever the search text is found.
Word doc in which I am trying to search is that which I have converted from pdf to word using macro. If you can help me in that macro so that while conversion nothing goes in the headers, even that will solve the purpose.

macropod
04-04-2021, 12:14 AM
Here's one way:

Sub Demo()
Dim Rct As Rectangle, i As Long, StrOut As String
With ActiveDocument
For i = 1 To .ComputeStatistics(wdStatisticPages)
With .ActiveWindow.Panes(1).Pages(i)
For Each Rct In .Rectangles
With Rct.Range.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "Text to find"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute
If .Found = True Then
StrOut = StrOut & vbCr & i: Exit for
End If
End With
Next
End With
Next
End With
MsgBox "Text found on pages:" & StrOut
End Sub
The above code will potentially test every page in full, so it could be somewhat slower than a standard looped Find.

gmaxey
04-05-2021, 05:20 AM
Paul,

Just looked at this briefly and perhaps you have a reason I'm not aware of for staying in the rectangle after the page number added to string. Also, when you start introducing comments and such, errors can occur:


Sub TextFoundOn_IncludingHeaderFooters()
Dim oRect As Rectangle, lngIndex As Long, strOut As String
With ActiveDocument
For lngIndex = 1 To .ComputeStatistics(wdStatisticPages)
With .ActiveWindow.Panes(1).Pages(lngIndex)
For Each oRect In .Rectangles
If oRect.RectangleType = wdTextRectangle Then
On Error GoTo Err_Rect
With oRect.Range.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "Text to find"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
If .Execute Then
strOut = strOut & vbCr & lngIndex
Exit For
End If
End With
End If
Err_Reenter:
Next
End With
Next
End With
MsgBox "Text found on pages:" & strOut
lbl_Exit:
Exit Sub
Err_Rect:
Resume Err_Reenter
End Sub

macropod
04-05-2021, 08:18 PM
Just looked at this briefly and perhaps you have a reason I'm not aware of for staying in the rectangle after the page number added to string. Also, when you start introducing comments and such, errors can occur
Hi Greg,

I was in a bit of a hurry when I did the coding, so I didn't address exiting the loop neatly (fixed) or test for errors with comments etc. For completeness, code would be needed to both address those errors and test the comment, etc. content. FWIW, the comment-related error only seems to occur if the comment is visible.