PDA

View Full Version : Extracting highlighted words from multiple documents



prefect
01-13-2016, 01:49 AM
Hello,

User of MS Office 2010 here. I am currently working on an assignment where I have multiple text files and need to extract key information onto a separate file.

I have identified the necessary keywords in files by using a macro which highlights keywords in yellow in all word files within the folder.
So now I have many documents with key information highlighted in yellow in the same folder, and I want to extract those highlighted words onto a separate file.

I have a working code for extracting yellow-highlighted words from one open document onto a newly-created word file, but I can't get it to extract yellow-highlighted words from multiple documents (from same folder) onto the same newly created file.
Here is the code that works for extracting yellow-highlighted words from one document onto a separate file.

```

Sub ExtractHighlight()

Dim objWord As Word.Application
Dim doc As Word.Document
Dim oneword
Dim i As Integer
i = 0

'Create Word doc object
Set objWord = CreateObject("Word.Application")

With objWord
' Ensure the MS Word object is visible
.Visible = True

' Add a new word document and save the file prior to adding text
Set doc = .Documents.Add
doc.SaveAs "C:\test\output.docx"
' Or open an existing document
'Set doc = wrdApp.Documents.Open("C:\Foldername\Filename.doc")
End With

'Construct document
With objWord.Selection
' Set the font type
.Font.Name = "Trebuchet MS"
' Set the font size
.Font.Size = 16
' Set the format, depending on the value of i
' Add text
For Each char In ActiveDocument.Characters

If char.HighlightColorIndex = wdYellow Then
i = 1
oneword = char.Text
'oneword = oneword & "##"
.TypeText oneword
oneword = ""
Else
If i = 1 Then
.TypeText vbCrLf
i = 0
End If
End If
Next
End With
' Save the file
doc.Save
' Bring the MS Word window to the front
doc.Activate
End Sub

```

Can I change it to search multiple documents (i.e. the entire folder) and export yellow-highlighted words from them into one file? Alternatively, should I open a new empty document and then run a macro on it instead of running a macro on the doc with text?

In short, here's what I need:
1) identify yellow-highlighted words/text in all docs within a selected folder
2)export yellow-highlighted words to a separate file

Any help is much appreciated,

Thanks,
Prefect

gmayor
01-13-2016, 06:54 AM
You could use the following function with http://www.gmayor.com/document_batch_processes.htm which will handle the folders and sub folders as a custom process. The functiojn will write each instance of a yellow highlighted text to a new paragraph in the document C:\Path\Log.docx. As opening a document that is already open simply gives focus to that document, the various documents extracts are added to that document. The document must exist before you start! The name and path are unimportant as long as you have read/write access and you make any name change in the function itself


Function ExtractHiLight(oDoc As Document) As Boolean
Dim oRng As Range
Dim oNewDoc As Document
Set oNewDoc = Documents.Open(Filename:="C:\Path\Log.docx", AddToRecentFiles:=False) 'The name and path of the document to record the data.
On Error GoTo err_Handler
Set oRng = oDoc.Range
With oRng.Find
.Highlight = True
Do While .Execute
If oRng.HighlightColorIndex = wdYellow Then
oNewDoc.Range.InsertAfter oRng.Text & vbCr
oRng.Collapse 0
End If
Loop
oNewDoc.Save
End With
ExtractHiLight = True
lbl_Exit:
Exit Function
err_Handler:
ExtractHiLight = False
Resume lbl_Exit
End Function

prefect
01-13-2016, 09:09 AM
Thanks very much! I will look into it. However, would it be possible to do that just with a macro, without downloading a plug-in?

I will probably need to use it on other computers and show the complete process, so a single macro would be more transparent/helpful.

prefect
01-13-2016, 09:25 AM
For example, I use the following macro to find and highlight keywords in all files in the folder (when I run the macro a pop-up window shows up where I specify the folder). The macro has to be run from a document outside the folder, but it does not extract any keywords, just highlights them in yellow in all files in the folder. Here's the macro:


Sub HighlightMultipleFiles()
Dim intResult As Integer
Dim strPath As String
Dim arrFiles() As String
Dim i As Integer
'the dialog is displayed to the user
intResult = Application.FileDialog(msoFileDialogFolderPicker).Show
'checks if user has cancled the dialog
If intResult <> 0 Then
'dispaly message box
strPath = Application.FileDialog( _
msoFileDialogFolderPicker).SelectedItems(1)
arrFiles() = GetAllFilePaths(strPath)
For i = LBound(arrFiles) To UBound(arrFiles)
Call HighlightKeywords(arrFiles(i))
Next i
End If
End Sub

Private Sub HighlightKeywords(ByVal strPath As String)
Dim objDocument As Document
Set objDocument = Documents.Open(strPath)
objDocument.Activate
Application.ScreenUpdating = False
Dim StrFnd As String, Rng As Range, i As Long, BStrFnd As String, j As Long
Set oRng = ActiveDocument.Range
oRng.HighlightColorIndex = wdNoColor
Options.DefaultHighlightColorIndex = wdYellow
StrFnd = "\(Auto:*\),\(Moto:*\),\(Aqua:*\),\(Terra:*\)]"
For i = 0 To UBound(Split(StrFnd, ","))
Set Rng = ActiveDocument.Range
With Rng.Find
.ClearFormatting
.Text = Split(StrFnd, ",")(i)
.Replacement.ClearFormatting
.Replacement.Highlight = True
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
Next
BStrFnd = "^# January 19^#^#,^# November 19^#^#,^# December 19^#^#"
For j = 0 To UBound(Split(BStrFnd, ","))
Set Rng = ActiveDocument.Range
With Rng.Find
.ClearFormatting
.Text = Split(BStrFnd, ",")(j)
.Replacement.ClearFormatting
.Replacement.Highlight = True
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
Next
Set Rng = Nothing
Application.ScreenUpdating = True
objDocument.Close (True)
End Sub

Private Function GetAllFilePaths(ByVal strPath As String) _
As String()
Dim objFSO As Object
Dim objFolder As Object
Dim objFile As Object
Dim i As Integer
Dim arrOutput() As String
ReDim arrOutput(1 To 1)
'Create an instance of the FileSystemObject
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get the folder object
Set objFolder = objFSO.GetFolder(strPath)
i = 1
'loops through each file in the directory and
'prints their names and path
For Each objFile In objFolder.Files
ReDim Preserve arrOutput(1 To i)
'print file path
arrOutput(i) = objFile.Path
i = i + 1
Next objFile
GetAllFilePaths = arrOutput
End Function

Would it be possible to add another function to this macro so that it also extracts the yellow-highlighted keywords from all docs in the same folder (as specified in pop-up window initially) into a separate file?

Thanks,
Prefect