Log in

View Full Version : searching through header and footer of word document



Ppap oz
03-28-2017, 05:05 PM
Hi All,

I was hoping to get some assistance in modifying something previously put together for another user.
I am new to VBA and I have managed to get this to work (thanks to all the contributors!!) however I have found that it does not search through headers and footers which I also need to include in my search.
Basically my requirement is to:
1) Search through 4000 documents to find where particular text is mentioned
2) Have these results listed in a spreadsheet for further analysis.
3) a secondary process to go through and actually replace the located text with its "Replacement text".The list of replacement text will be about 50 strings of information EG:
"12 Street St" replaced with "14 Road Rd"
"Ph: 03 9899 5455" replaced with ""Ph: 03 9899 4433"

As I mentioned, the previous code works fine however it does not search through headers and footers..
Any help you can offer would be greatly appreciated!

P.S. i am running office 2010 on a windows 7 machine and the previous code runs fine.
I have attached the spreadsheet with the macro in it.
Please let me know if any further information is required

Many thanks,
Paul

18797

gmayor
03-28-2017, 09:27 PM
http://www.gmayor.com/document_batch_processes.htm will do almost all of what you require, except that the log will be in a Word document.

Ppap oz
03-28-2017, 09:59 PM
Thanks Graham, had a quick look however i think this solution falls short for 2 reasons.
1) I need to compile a database of which fields appear in which document (this does not exist yet) and the code i have inserted into my post completes this, except for searching through the footer of the document. this is the part i really need help with. I have figured out how to complete replacing the text in all sections.
2) To get approval to install non standard software on my machine will take weeks and I don't have that much time!

Thanks so much for your suggestion but unfortunately the search continues!
Please do let me know if you can assist with the VBA code!

Cheers,
Paul

macropod
03-28-2017, 11:45 PM
Without seeing your code, it's impossible to be precise about what changes you need to make. However, you might start with:

Sub Demo()
Application.ScreenUpdating = False
Dim Sctn As Section, HdFt As HeaderFooter
With ActiveDocument
For Each Sctn In .Sections
For Each HdFt In Sctn.Headers
With HdFt
If .LinkToPrevious = False Then
With .Range
'Do range processing here
End With
End If
End With
Next
For Each HdFt In Sctn.Footers
With HdFt
If .LinkToPrevious = False Then
With .Range
'Do range processing here
End With
End If
End With
Next
Next
End With
End Sub
Of course, even that won't address content in textboxes, footnotes or endnotes...

macropod
03-28-2017, 11:50 PM
Cross-posted at: http://www.msofficeforums.com/word-vba/34955-help-include-header-footer-string-search-code.html
Please read VBA Express' policy onCross-Posting in item 3 of the rules: http://www.vbaexpress.com/forum/faq.php?faq=new_faq_item#faq_new_faq_item3 (http://www.vbaexpress.com/forum/faq.php?faq=new_faq_item#faq_new_faq_item3)

gmayor
03-29-2017, 05:11 AM
To get approval to install non standard software on my machine will take weeks and I don't have that much time!It never fails to surprise me that companies which are unwilling to allow users to employ processes created by people who know what they are doing, but are happy to let their staff, who don't, loose with VBA on their company data.

Ppap oz
03-29-2017, 03:53 PM
Hi Guys,
Apologies for the cross post, as i mentioned i am new to this whole thing. I will be sure to update both posts with the solution once we get there!
Thanks for your post macropod. I have tried to include this into the code i have been using however i have only managed to break it!
I am going to keep trying but my knowledge of programming is very limited. If any one can show me how to get this to search through the headers and footers i would be very grateful!
The code i am using is below.
Many many thanks,
Paul.


Sub Main()


Const TARGET_FOLDER_PATH As String = "C:\TEMP\"


Dim fso As FileSystemObject
Dim oTargetFolder As Folder
Dim f As File


Dim appWD As Word.Application
Dim docSource As Word.Document
Dim oSearchRange As Word.Range


Dim rngPartnumber As Range
Dim Rw As Long, Paras As Long, Chk As Long




Set fso = New FileSystemObject
Set oTargetFolder = fso.GetFolder(TARGET_FOLDER_PATH)


Set appWD = New Word.Application
For Each rngPartnumber In Range("partnumbers")
Rw = rngPartnumber.Row
For Each f In oTargetFolder.Files
If UCase(Right(f.Name, 4)) = "DOCX" Then
Set docSource = appWD.Documents.Open(TARGET_FOLDER_PATH & f.Name)
Set oSearchRange = docSource.wdHeaderFooter
With oSearchRange.Find
.ClearFormatting
.MatchWholeWord = True
.Text = rngPartnumber.Text
Do
If .Execute Then
docSource.Range(docSource.Paragraphs(1).Range.Start, _
oSearchRange.End).Select
Paras = appWD.ActiveWindow.Selection.Paragraphs.Count
Cells(Rw, 256).End(xlToLeft).Offset(0, 1).Value = f.Name & " *Page: " & _
appWD.ActiveWindow.Selection.Information(wdActiveEndPageNumber) _
& " *Para: " & Paras
End If
If Paras = Chk Then Exit Do
Chk = Paras
Loop
End With
docSource.Close False
End If
Next f
Next rngPartnumber
appWD.Quit False
End Sub

One final note - I also need the search to continue looking through the body of the document also.
Basically i need to find the given search string anywhere in the document.

Ppap oz
03-29-2017, 05:57 PM
Just realised that the above is code i have been playing with and not the original code.
"Set oSearchRange = docSource.wdHeaderFooter" should actually be "Set oSearchRange = docSource.Content"
Further to this i have been trying to get a grasp of what the code is doing and have realised that it may be much simpler to have multiple macros, with each one searching either the body, header or footer of the doc. this would still satisfy the requirements as i can merge the output data together in excel.

Hope this makes things easier.
Once again, any assistance would be very much appreciated.

Cheers,
Paul

macropod
03-29-2017, 07:01 PM
Having looked at your code, it seems you're trying to retrieve page #s & paragraph #s for the found content; that'll never work with headers & footers, as they don't belong to a page and don't have a paragraph index that relates to the document body.

Back to square 1, methinks.

Ppap oz
03-29-2017, 09:18 PM
Ahaa, i see. Does anyone have code on how to be able to search through headers and footers for a list of key words?

macropod
03-29-2017, 10:50 PM
Ahaa, i see. Does anyone have code on how to be able to search through headers and footers for a list of key words?
That's hardly an issue. The problem is one of how you want to report the fact of anything being found there. Besides which, what about content in textboxes, footnotes or endnotes?

gmaxey
03-30-2017, 05:54 AM
You are still faced with the problem Paul identified:


Public Sub FRAnywhere()
Dim rngStory As Word.Range
Dim arrKeyWords() As String
Dim lngValidate As Long
Dim oShp As Shape
Dim lngIndex As Long
arrKeyWords = Split("Apple,Peach,Pare", ",")
lngValidate = ActiveDocument.Sections(1).Headers(1).Range.StoryType
ResetFRP Selection.Range
For lngIndex = 0 To UBound(arrKeyWords)
'Iterate through all story types in the current document
For Each rngStory In ActiveDocument.StoryRanges
'Iterate through all linked stories
Do
FlagInStory rngStory, arrKeyWords(lngIndex)
On Error Resume Next
Select Case rngStory.StoryType
Case 6, 7, 8, 9, 10, 11
If rngStory.ShapeRange.Count > 0 Then
For Each oShp In rngStory.ShapeRange
If Not oShp.TextFrame.TextRange Is Nothing Then
FlagInStory oShp.TextFrame.TextRange, arrKeyWords(lngIndex)
End If
Next
End If
Case Else: 'Do Nothing
End Select
On Error GoTo 0
'Get next linked story (if any)
Set rngStory = rngStory.NextStoryRange
Loop Until rngStory Is Nothing
Next
Next lngIndex
lbl_Exit:
Exit Sub
End Sub
Public Sub FlagInStory(ByVal rngStory As Word.Range, ByVal strFind As String)
With rngStory.Find
.ClearFormatting
.Text = strFind
While .Execute
rngStory.Font.ColorIndex = wdBlue
rngStory.Collapse wdCollapseEnd
Wend
End With
lbl_Exit:
Exit Sub
End Sub
Sub ResetFRP(oRng As Range)
With oRng.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute
End With
lbl_Exit:
Exit Sub
End Sub

Ppap oz
03-30-2017, 05:56 PM
Hi All,
Ok so this is where im at:
Thanks for your suggestion Greg however I have over 4000 docs to go through and this only seems to cater for the current doc.
To be clear, I need to first do the analysis on the docs and populate all found information (hits on a particular string) into a workbook and then go through and update the files separately. I think I have been able to sort out code for updating (albeit a very manual code set!) but that is the least of my current problems.

I have been playing with the code I have been using to search the body of the docs and have been able to make it work for returning doc name and page numbers for hits in the footer of a doc. Unfortunately this only works when the header footer of the doc is constant for the whole doc and does not work if there are different odd or even page headers/footers or a different first page header/footer.
I have bastardised the code to try and work around the “Paragraph number” as Macropod had pointed out. But my confidence level on this is low (as I clearly have no idea what I am doing!)

It is definitely OK for me to run a separate macro to get the Header and Footer info (and this could even be split up further into Just Header info and Just footer info. I will be manipulating all the data in excel and merging it into one sheet is no problems).

Where I really need help at the moment is to ensure that all different types of headers or footers are searched through. All I really need to record is that the string has been matched, whether it was in the footer or header and if possible the page number.
Expected output sample: “SampleDoc1.docx *Page: 1 *Footer”
“SampleDoc1.docx *Page: 1 *Header”
“SampleDoc1.docx *Page: 2 *Header”
The other area I could do with some help is cleaning up the code around the “paragraph” issue as I am not 100% confident that this working properly!

Macropod has also pointed out that textboxes, Footnotes and Endnotes are not covered with this. This is a minor concern as I don’t expect any info to be contained like this however if it can be easily included I would be very grateful!

Appreciate all the help and suggestions so far!!

Sorry for the long post, just trying to be thorough!
My Bastardised code can be found below. (please don't laugh!)
Thanks again,
Paul



Sub Main()


Const TARGET_FOLDER_PATH As String = "C:\Temp\SearchThisFolder\"


Dim fso As FileSystemObject
Dim oTargetFolder As Folder
Dim f As File


Dim appWD As Word.Application
Dim docSource As Word.Document
Dim oSearchRange As Word.Range


Dim rngPartnumber As Range
Dim Rw As Long, Paras As Long, Chk As Long




Set fso = New FileSystemObject
Set oTargetFolder = fso.GetFolder(TARGET_FOLDER_PATH)


Set appWD = New Word.Application
For Each rngPartnumber In Range("partnumbers")
Rw = rngPartnumber.Row
For Each f In oTargetFolder.Files
If UCase(Right(f.Name, 4)) = "DOCX" Then
Set docSource = appWD.Documents.Open(TARGET_FOLDER_PATH & f.Name)
Set oSearchRange = docSource.Sections(1).Footers(wdHeaderFooterPrimary).Range
With oSearchRange.Find
.ClearFormatting
.MatchWholeWord = True
.Text = rngPartnumber.Text
Do
If .Execute Then
'docSource.Range(docSource.Paragraphs(1).Range.Start,
'oSearchRange.End).Select
'Paras = appWD.ActiveWindow.Selection.Paragraphs.Count
Cells(Rw, 256).End(xlToLeft).Offset(0, 1).Value = f.Name & " *Page: " & _
appWD.ActiveWindow.Selection.Information(wdActiveEndPageNumber) _
& " *Para: "
End If
If 1 = 1 Then Exit Do
'Chk = Paras
Loop
End With
docSource.Close False
End If
Next f
Next rngPartnumber
appWD.Quit False
End Sub

macropod
03-30-2017, 09:27 PM
Where I really need help at the moment is to ensure that all different types of headers or footers are searched through. All I really need to record is that the string has been matched, whether it was in the footer or header and if possible the page number.
Expected output sample: “SampleDoc1.docx *Page: 1 *Footer”
“SampleDoc1.docx *Page: 1 *Header”
“SampleDoc1.docx *Page: 2 *Header”
You don't seem to be paying attention. HEADERS & FOOTERSDO NOT HAVE PAGE #s. Every Section in a document has THREE header & footer objects, and a document could have hundreds of Sections. Some or all of the headers & footers in those Sections might be linked to the previous Section, so the headers & footers would just be continuing what was already there; others might not be linked, in which case they'd have their own content. A single header, therefore, might span a thousand pages or more. I doubt you'd want every such page reported... Granted, each Section starts on a given page, but you could also have two or more Sections starting on the same page. A far more meaningful metric to report would be the Section # and which header/footer the content is in - but only if that header/footer isn't linked to a previous one. After all, if they're linked, there's really only one instance of the content.

As for Greg's code, that could quite easily be integrated with your own. Simply replace:

Set oSearchRange = docSource.wdHeaderFooter
With oSearchRange.Find
.ClearFormatting
.MatchWholeWord = True
.Text = rngPartnumber.Text
Do
If .Execute Then
docSource.Range(docSource.Paragraphs(1).Range.Start, _
oSearchRange.End).Select
Paras = appWD.ActiveWindow.Selection.Paragraphs.Count
Cells(Rw, 256).End(xlToLeft).Offset(0, 1).Value = f.Name & " *Page: " & _
appWD.ActiveWindow.Selection.Information(wdActiveEndPageNumber) _
& " *Para: " & Paras
End If
If Paras = Chk Then Exit Do
Chk = Paras
Loop
End With
from post #7 with:
Call FRAnywhere
However, because you haven't addressed the issues I raised about exactly what it is you want to output for each range, Greg's code doesn't actually do anything except loop through them and colour whatever's found with a blue font.

PS: When posting code, please use the code tags, indicated by the # button on the posting menu. Without them, your code loses much of whatever structure it had. I already fixed it for post #7, but I don't propose to do that for all your posts...

Ppap oz
03-30-2017, 10:28 PM
Thanks macropod, I have updated the code formatting... I thought it looked different after I had originally posted it!
I guess what I am trying to achieve is a list of instances from the headers and footers of a document where the given search strings are mentioned. Where it is linked to a previous section, it is not required to be noted as updating it in 1 section should update it in all sections (if I am understanding this correctly!).
As for the “Section”, how would a user be able to identify where this is on a word doc? I ask this because we will need to spot check instances of text that were identified/updated and confirm that they were successfully and correctly updated.
So I guess the expected output would be something along the lines of:
“SampleDoc1.docx *Section: 1 *Header”
“SampleDoc1.docx *Section: 2 *Header”
“SampleDoc1.docx *Section: 1 *Footer”
Sorry for all the back and forth, I am a novice at all this stuff!
Hope that you can assist :)
Paul

macropod
03-30-2017, 11:05 PM
As for the “Section”, how would a user be able to identify where this is on a word doc? I ask this because we will need to spot check instances of text that were identified/updated and confirm that they were successfully and correctly updated.
Right-clicking on Word's status bar gives you the option of displaying which Section you're in. You can also go to a specific Section via F5.

Doing spot checks based on what's output to Excel would be rather futile: if the strings aren't being found, they won't be reported - so you won't know they've been missed - but any that are found you can count on a F/R using the same logic actually replacing in all files if it replaces them in any.

So I guess the expected output would be something along the lines of:
“SampleDoc1.docx *Section: 1 *Header”
“SampleDoc1.docx *Section: 2 *Header”
“SampleDoc1.docx *Section: 1 *Footer”
No, it might be more like:
“SampleDoc1.docx *Section: 1 *Primary Header”
“SampleDoc1.docx *Section: 1 *First Page Header”
“SampleDoc1.docx *Section: 1 *PrimaryFooter”
“SampleDoc1.docx *Section: 2 *PrimaryHeader”
“SampleDoc1.docx *Section: 2 *First Page Header”
“SampleDoc1.docx *Section: 2 *Even Page Header”
depending, of course, on which headers/footers the content is found in for a given Section. And just to add a wrinkle to any spot-checking, an even-page or primary header/footer might contain content but not be visible due to a given Section having only one page (or sometimes only two pages)...

Ppap oz
04-02-2017, 04:28 PM
Hi Macropod,
I hear what you are saying about the manual soot checking but nonetheless as part of a compliance check (which we are forced to do) i will need to eyeball some of the docs to sign off on the changes having been made.
Your sample output satisfies the requirements. :clap:
So in terms of making this code runnable, i have tried putting it all together but i cant even get anything to run.
Can anyone help me to put this all together so that i can report on the instances of each search string (in the footers and headers)?

Free beers on offer if you can assist*
*Free beers subject to availability and being in Melbourne, Victoria! Or somehow getting to a common location.


Many many thanks!
Paul

Ppap oz
04-03-2017, 10:42 PM
Hi All,
Really appreciate all the help on this so far and I feel we are at the final gate but I just cant get a VBA script to return the information from all the headers and footers in a document. Unfortunately if I cant get this may all have been for nothing as each file will need to be reviewed manually anyway :(
Can anyone help with this?
I would be very appreciative!
Thanks in advance,
Paul

gmaxey
04-04-2017, 04:21 AM
The code I sent you which you dismissed will find any instance of a find term in any header or footer. It is jus ta matter of changing it from turning that text blue to querying the section number and writing it to your report string which I will leave to you to figure out.

Ppap oz
04-04-2017, 05:13 AM
Hi Greg,
I have not dismissed your code at all. I have spent the last day trying to combine the 2 sets together but this is my first script and i have no idea what im doing. Its just a lit of googling and trial and error. This is why i have asked for help on this forum. Is it possible for you to guide me through how i can query the section number and write it to the report string?
As i mentioned, any assistance would be greatly appreciated!


The code I sent you which you dismissed will find any instance of a find term in any header or footer. It is jus ta matter of changing it from turning that text blue to querying the section number and writing it to your report string which I will leave to you to figure out.

macropod
04-04-2017, 07:20 PM
Aside from the question of adapting Greg's code, I think you need to reconsider the output specs. You say you have some 4000 document to process. As coded, your macro tries to output each match to a new column. Yet it never looks beyond column 256, meaning any more than 255 matches will simply overwrite the data in column 256 (IV). But even if you took away the 256 column restriction, an Excel 2007 & later workbook still only has 16384 columns - to cater for your 4000 documents. I suspect you could quickly find Excel running out of space. I'd suggest having a single column per document and outputting all matches for a given 'find' string into a single cell.