PDA

View Full Version : find text string and return page number



samuelimtech
02-05-2014, 08:46 AM
Hi im familair with excel vba but not at all with word so if someone can get me started im sure i can do the rest.
im after a macro that will search a document for m:* and return both the text that fits the search criteria and its page number.
I would like this information arranged in a table with 3 columns(1 cell contains the text string and page no).
the table shows an example of the kind of result im after.


M:1 Page 20
M:62 Page 54
M:93 Page 61


m:2 Page 45
M:94 Page 29
M:99 Page 84


M:3 Page 150
M:98.1 Page 60
M:100 Page 100




thank you for any assistance.

macropod
02-05-2014, 11:06 PM
Try the following macro; it's a minor adaptation of one that I've produced for another purpose. As coded, it outputs the data in a slightly different format, using a two-column table. It also caters for the possibilities that (a) the document might be updated and the macro run again; and (b) the same term will occur more than once in a document. Some further mods could be made to generate your desired output format.

Sub TabulateKeyTerms()
Application.ScreenUpdating = False
Dim Doc As Document, Rng As Range, Tbl As Table
Dim StrTerms As String, strFnd As String, StrPages As String
Dim i As Long, j As Long, StrOut As String, StrBreak As String
StrTerms = vbCr
Set Doc = ActiveDocument
StrPages = ""
'Go through the document looking for defined terms.
With Doc.Content
'Check whether our table exists. If so, delete it.
If .Bookmarks.Exists(Name:="_Defined_Terms") Then
.Bookmarks("_Defined_Terms").Range.Tables(1).Delete
End If
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Wrap = wdFindStop
.MatchWholeWord = True
.MatchWildcards = True
.MatchCase = False
'Find expressions between matched pairs of double quotes,
'allowing for the fact that 'smart quotes' may not be in use.
.Text = "[Mm]:[0-9]{1,}"
.Execute
End With
Do While .Find.Found
Set Rng = .Duplicate
With Rng
'If it's not in the StrTerms list, add it.
If InStr(StrTerms, vbCr & .Text & vbCr) = 0 Then
StrTerms = StrTerms & .Text & vbCr
End If
End With
.Find.Execute
Loop
End With
'Exit if no defined terms have been found.
If StrTerms = vbCr Then
MsgBox "No defined terms found." & vbCr & "Aborting.", vbExclamation, "Defined Terms Error"
GoTo ErrExit
End If
'Sort the key terms
Set Rng = ActiveDocument.Range.Characters.Last
With Rng
.Collapse wdCollapseEnd
.InsertBefore vbCr
.InsertAfter StrTerms
.Sort ExcludeHeader:=True, FieldNumber:=1, SortFieldType:=wdSortFieldAlphanumeric, _
SortOrder:=wdSortOrderAscending
StrTerms = .Text
.Text = vbNullString
End With
'Build the page records for all terms in the StrTerms list.
For i = 0 To UBound(Split(StrTerms, vbCr)) - 1
strFnd = Trim(Split(StrTerms, vbCr)(i))
StrPages = ""
With Doc.Content
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Text = strFnd
.Wrap = wdFindStop
.MatchWholeWord = True
.MatchWildcards = False
.MatchCase = True
.Execute
End With
j = 0
Do While .Find.Found
'If we haven't already found this term on this page, add it to the list.
If j <> .Duplicate.Information(wdActiveEndAdjustedPageNumber) Then
j = .Duplicate.Information(wdActiveEndAdjustedPageNumber)
StrPages = StrPages & j & " "
End If
.Find.Execute
Loop
'Turn the pages list into a comma-separated string.
StrPages = Replace(Trim(StrPages), " ", ",")
If StrPages <> "" Then
'Add the current record to the output list (StrOut)
StrOut = StrOut & strFnd & vbTab & Replace(ParsePageRefs(StrPages, "&"), ",", ", ") & vbCr
End If
End With
Next i
'Output the found terms as a table at the end of the document.With Rng
'Calculate the number of table lines for the data.
j = Round(UBound(Split(StrOut, vbCr)) / 2)
Set Tbl = ActiveDocument.Tables.Add(Range:=Rng, NumRows:=j + 1, NumColumns:=2)
With Tbl
'Define the overall table layout.
With .Range.ParagraphFormat
.RightIndent = CentimetersToPoints(2.5)
With .TabStops
.ClearAll
.Add Position:=CentimetersToPoints(7.5), Alignment:=wdAlignTabRight, Leader:=wdTabLeaderDots
End With
End With
'Populate & format the header row.
With .Cell(1, 1).Range
.Text = "Term" & vbTab & "Pages"
.ParagraphFormat.KeepWithNext = True
End With
With .Cell(1, 2).Range
.Text = "Term" & vbTab & "Pages"
.ParagraphFormat.KeepWithNext = True
End With
'Delete the header row's tab leaders.
With .Rows.First.Range
With .ParagraphFormat.TabStops
.ClearAll
.Add Position:=CentimetersToPoints(7.5), Alignment:=wdAlignTabRight, Leader:=wdTabLeaderSpaces
End With
.Font.Bold = True
End With
'Apply the heading row attribute so that the table header repeats after a page break.
.Rows.First.HeadingFormat = True
'Populate the data rows
For i = 0 To j - 1
.Cell(i + 2, 1).Range.Text = Split(StrOut, vbCr)(i)
Next
For i = j To UBound(Split(StrOut, vbCr)) - 1
.Cell(i - j + 2, 2).Range.Text = Split(StrOut, vbCr)(i)
Next
'Bookmark the table.
ActiveDocument.Bookmarks.Add Name:="_Defined_Terms", Range:=Tbl.Range
End With
'Clean up and exit.
ErrExit:
Set Rng = Nothing: Set Tbl = Nothing: Set Doc = Nothing
Application.ScreenUpdating = True
End Sub
'
Function ParsePageRefs(StrPages As String, Optional StrEnd As String)
'This function converts multiple sequences of 3 or more consecutive numbers in a
' list to a string consisting of the first & last numbers separated by a hyphen.
Dim ArrTmp(), i As Integer, j As Integer, k As Integer
ReDim ArrTmp(UBound(Split(StrPages, ",")))
For i = 0 To UBound(Split(StrPages, ","))
ArrTmp(i) = Split(StrPages, ",")(i)
Next
For i = 0 To UBound(ArrTmp) - 1
If IsNumeric(ArrTmp(i)) Then
k = 2
For j = i + 2 To UBound(ArrTmp)
If CInt(ArrTmp(i) + k) <> CInt(ArrTmp(j)) Then Exit For
ArrTmp(j - 1) = ""
k = k + 1
Next
i = j - 2
End If
Next
StrPages = Join(ArrTmp, ",")
StrPages = Replace(Replace(Replace(StrPages, ",,", " "), ", ", " "), " ,", " ")
While InStr(StrPages, " ")
StrPages = Replace(StrPages, " ", " ")
Wend
StrPages = Replace(Replace(StrPages, " ", "-"), ",", ", ")
If StrEnd <> "" Then
i = InStrRev(StrPages, ",")
If i > 0 Then
StrPages = Left(StrPages, i - 1) & Replace(StrPages, ",", " " & Trim(StrEnd), i)
End If
End If
ParsePageRefs = StrPages
End Function

samuelimtech
02-06-2014, 02:06 AM
Hi ive tried the macro. it does a great job of returning the text string but doesnt reutnr the page number that the text string was found. ive tried to edit it to work but i fear its a little beyond me.

macropod
02-06-2014, 02:46 AM
Aside from an extra 'End With' near the end of the sub (which I've now deleted), it works for me. Did you include the ParsePageRefs function as well?

samuelimtech
02-06-2014, 03:05 AM
yes ive included it and removed the end with. Ive obviously done something wrong. if you dont mind could you have a look at what ive done. ive put docx for security reasons. just remove the space after "www".

www .dropbox.com/s/pk4c44fwi9l9y6y/RCS%20Signalling%20%281a%29%20-%20SJB%20Copy.docx

thank you for all your help

samuelimtech
02-06-2014, 03:10 AM
www .dropbox.com/s/pk4c44fwi9l9y6y/RCS%20Signalling%20%281a%29%20-%20SJB%20Copy.docx
I broke the link sorry

macropod
02-06-2014, 03:45 AM
I get "No results found for www .dropbox.com/s/pk4c44fwi9l9y6y/RCS%20Signalling%20%281a%29%20-%20SJB%20Copy.docx."

You'd probably do better to attach the document to a post here, via the paperclip symbol on the 'Go Advanced' tab.

samuelimtech
02-06-2014, 03:52 AM
https://www.dropbox.com/s/pk4c44fwi9l9y6y/RCS%20Signalling%20%281a%29%20-%20SJB%20Copy.docx

macropod
02-06-2014, 04:13 AM
Works for me. See attached. As a bonus, I've included a revised version of the macro (in Module 2) that lets you determine the number of columns in the table.

samuelimtech
02-06-2014, 04:37 AM
thank you, you probably already know this but just incase you dont, the reason it wouldnt work for me was because i was tracking changes( ithink cos i turned them off it works).

I really appreaciate your help thank you

samuelimtech
02-06-2014, 05:07 AM
I hate to be an unbelievable pain but the macro doesnt pick up text strings that contains a . e.g M:98.1 Page 60. ive tried to make suss out how to make the adjustments but put simply i have no idea how it works lol.

macropod
02-06-2014, 04:22 PM
Your specs didn't include that...

You could try changing:
.Text = "[Mm]:[0-9]{1,}"
to:
.Text = "[Mm]:[0-9.]{1,}"
but do note this is liable to pick up periods after the number as well. For that, you could insert:

While .Characters.Last.Text = "."
.End = .End - 1
Wend
after the first 'With Rng'.

PS: Here's some simpler code for populating the table's data rows. To use it, replace the entire For-Next loop below the 'Populate the data rows' comment.

'Populate the data rows
For i = 0 To UBound(Split(StrOut, vbCr)) - 1
.Range.Cells(i + lCol + 1).Range.Text = Split(StrOut, vbCr)(i)
Next
This also obviates the need for the k & l variables and the code that sets their values.

samuelimtech
02-07-2014, 01:19 AM
thank you works a treat. gunna have to get some studying done for word VBA, i wrongly assumed theyd be the same :(