PDA

View Full Version : Extract specific word followed by defined number of characters from a word document



Marina
06-05-2016, 05:57 AM
Hello everyone, I'm new to VBA but very excited by its potential. Here is my problem: I'm trying to extract the word "PMID" as well as the 9 characters following that word from a word document into an excel spreadsheet. The word PMID can come up as much as 500 times in each word doc I'll be processing. I'll be processing roughly 200 separate word documents. I don't mind processing each word document one at a time if needed. I found the following code in a thread. It's helpful, but I don't know how to change the code to meet my specific need. Can Anyone help?

My understanding is, the following code will extract a sentence after the word "shall"
Option Explicit

Sub FindWordCopySentence()
Dim appExcel As Object
Dim objSheet As Object
Dim aRange As Range
Dim intRowCount As Integer
intRowCount = 1
Set aRange = ActiveDocument.Range
With aRange.Find
Do
.Text = "shall" ' the word I am looking for
.Execute
If .Found Then
aRange.Expand Unit:=wdSentence
aRange.Copy
aRange.Collapse wdCollapseEnd
If objSheet Is Nothing Then
Set appExcel = CreateObject("Excel.Application")
'Change the file path to match the location of your test.xls
Set objSheet = appExcel.workbooks.Open("C:\temp\test.xls").Sheets("Sheet1")
intRowCount = 1
End If
objSheet.Cells(intRowCount, 1).Select
objSheet.Paste
intRowCount = intRowCount + 1
End If
Loop While .Found
End With
If Not objSheet Is Nothing Then
appExcel.workbooks(1).Close True
appExcel.Quit
Set objSheet = Nothing
Set appExcel = Nothing
End If
Set aRange = Nothing
End Sub

gmayor
06-05-2016, 06:37 AM
The following will do what you ask, and will prompt for the folder containing (only) the documents to be processed.
The named workbook must exist and have at least a header row in column A

Option Explicit

Sub FindWordCopySentence()
Dim xlApp As Object
Dim xlBook As Object
Dim xlSheet As Object
Dim oRng As Range
Dim NextRow As Integer
Dim strPath As String
Dim strFile As String
Dim oDoc As Document
Dim fDialog As FileDialog
Const strWorkbook As String = "C:\Path\test.xlsx" 'Change the file path to match the location of your EXISTING workbook

Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
With fDialog
.Title = "Select folder to process and click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User", , "List Folder Contents"
Exit Sub
End If
strPath = fDialog.SelectedItems.Item(1)
If Right(strPath, 1) <> "\" Then strPath = strPath + "\"
End With
On Error Resume Next
Set xlApp = GetObject(, "Excel.Application")
If Err <> 0 Then
Set xlApp = CreateObject("Excel.Application")
End If
On Error GoTo 0
xlApp.Visible = True
Set xlBook = xlApp.workbooks.Open(strWorkbook)
Set xlSheet = xlBook.Sheets("Sheet1")
strFile = Dir$(strPath & "*.docx")
While strFile <> ""
Set oDoc = Documents.Open(strPath & strFile)
Set oRng = oDoc.Range
With oRng.Find
Do While .Execute(FindText:="PMID")
oRng.End = oRng.End + 9
NextRow = xlSheet.Range("A" & xlSheet.Rows.Count).End(-4162).Row + 1
xlSheet.Cells(NextRow, 1) = oRng.Text
oRng.Collapse 0
Loop
End With
xlBook.Save
oDoc.Close 0
DoEvents
strFile = Dir$()
Wend
'xlBook.Close True
lbl_Exit:
Set oDoc = Nothing
Set xlSheet = Nothing
Set xlBook = Nothing
Set xlApp = Nothing
Set oRng = Nothing
Exit Sub
End Sub

Marina
06-05-2016, 08:52 AM
Thank you so much, it worked! I'm thrilled :friends:

gmaxey
06-05-2016, 08:57 AM
Great. Graham has shown you one method and probably the most traditional method. Since you are new to VBA, I thought I would show you another just so you can see that there is usually more than one way to "skin a cat" so to speak.


Option Explicit
Const strWorkbook As String = "D:\Book1.xlsx" 'Change the file path to match the location of your EXISTING workbook
Sub ExtractPartsToExcel()
Dim strPath As String, strFile As String, strText As String
Dim arrText() As String, arrParts() As String
Dim lngIndex As Long, lngPart As Long
Dim oDoc As Document
Dim oDialog As FileDialog

'Get folder containing files to process.
Set oDialog = Application.FileDialog(msoFileDialogFolderPicker)
With oDialog
.Title = "Select folder to process and click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User", , "List Folder Contents"
Exit Sub
End If
strPath = oDialog.SelectedItems.Item(1)
If Right(strPath, 1) <> Application.PathSeparator Then strPath = strPath + Application.PathSeparator
End With
'Process .doc, .docx or .docm files only.
strFile = Dir$(strPath & "*.doc*")
lngPart = 0
While strFile <> ""
Set oDoc = Documents.Open(strPath & strFile, , , False)
strText = oDoc.Range.Text
'Split the text into parts delimted by the text "PMID"
arrText = Split(strText, "PMID")
'Build an array of parts consisting of the the first 9 characters of the individual parts.
For lngIndex = 1 To UBound(arrText)
ReDim Preserve arrParts(lngPart)
arrParts(lngPart) = "PMID" & Left(arrText(lngIndex), 9)
lngPart = lngPart + 1
Next lngIndex
oDoc.Close wdDoNotSaveChanges
DoEvents
strFile = Dir$()
Wend
'Write the parts to Excel
WriteToExcel arrParts
lbl_Exit:
Set oDoc = Nothing
Exit Sub
End Sub
Sub WriteToExcel(ByRef arrPassed As Variant)
Dim oApp As Object, oBook As Object, oSheet As Object
Dim lngNextRow As Long
If Not IsArray(arrPassed) Then Exit Sub
On Error Resume Next
Set oApp = GetObject(, "Excel.Application")
If Err <> 0 Then Set oApp = CreateObject("Excel.Application")
On Error GoTo 0
With oApp
.Visible = True
Set oBook = .Workbooks.Open(strWorkbook)
End With
Set oSheet = oBook.Sheets("Sheet1")
lngNextRow = oSheet.Range("A" & oSheet.Rows.Count).End(-4162).Row + 1
oBook.Sheets(1).Range("A" & lngNextRow).Resize(UBound(arrPassed) - LBound(arrPassed) + 1).Value = oApp.Transpose(arrPassed)
lbl_Exit:
Set oApp = Nothing: Set oBook = Nothing: Set oSheet = Nothing
Exit Sub
End Sub

gentle2005
10-18-2016, 12:49 AM
How to add file name to the first column/second column in excel along with finding specific text and 9 characters?

gmaxey
10-18-2016, 04:30 AM
You want someone to write or adapt an existing macro for you? Perhaps you could spend a little more effort explaining what you really want.

gentle2005
10-18-2016, 08:22 AM
Existing macro works good. so I want to utilize the same macro but in the excel sheet I want another column to include the file name from which this text+9 characters come.

Option Explicit
Const strWorkbook As String = "D:\Book1.xlsx" 'Change the file path to match the location of your EXISTING workbook
Sub ExtractPartsToExcel()
Dim strPath As String, strFile As String, strText As String
Dim arrText() As String, arrParts() As String
Dim lngIndex As Long, lngPart As Long
Dim oDoc As Document
Dim oDialog As FileDialog

'Get folder containing files to process.
Set oDialog = Application.FileDialog(msoFileDialogFolderPicker)
With oDialog
.Title = "Select folder to process and click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User", , "List Folder Contents"
Exit Sub
End If
strPath = oDialog.SelectedItems.Item(1)
If Right(strPath, 1) <> Application.PathSeparator Then strPath = strPath + Application.PathSeparator
End With
'Process .doc, .docx or .docm files only.
strFile = Dir$(strPath & "*.doc*")
lngPart = 0
While strFile <> ""
Set oDoc = Documents.Open(strPath & strFile, , , False)
strText = oDoc.Range.Text
'Split the text into parts delimted by the text "PMID"
arrText = Split(strText, "PMID")
'Build an array of parts consisting of the the first 9 characters of the individual parts.
For lngIndex = 1 To UBound(arrText)
ReDim Preserve arrParts(lngPart)
arrParts(lngPart) = "PMID" & Left(arrText(lngIndex), 9)
lngPart = lngPart + 1
Next lngIndex
oDoc.Close wdDoNotSaveChanges
DoEvents
strFile = Dir$()
Wend
'Write the parts to Excel
WriteToExcel arrParts
lbl_Exit:
Set oDoc = Nothing
Exit Sub
End Sub
Sub WriteToExcel(ByRef arrPassed As Variant)
Dim oApp As Object, oBook As Object, oSheet As Object
Dim lngNextRow As Long
If Not IsArray(arrPassed) Then Exit Sub
On Error Resume Next
Set oApp = GetObject(, "Excel.Application")
If Err <> 0 Then Set oApp = CreateObject("Excel.Application")
On Error GoTo 0
With oApp
.Visible = True
Set oBook = .Workbooks.Open(strWorkbook)
End With
Set oSheet = oBook.Sheets("Sheet1")
lngNextRow = oSheet.Range("A" & oSheet.Rows.Count).End(-4162).Row + 1
oBook.Sheets(1).Range("A" & lngNextRow).Resize(UBound(arrPassed) - LBound(arrPassed) + 1).Value = oApp.Transpose(arrPassed)
lbl_Exit:
Set oApp = Nothing: Set oBook = Nothing: Set oSheet = Nothing
Exit Sub
End Sub










You want someone to write or adapt an existing macro for you? Perhaps you could spend a little more effort explaining what you really want.

gentle2005
10-19-2016, 08:55 AM
can anybody help me as this will solve my days of work? Thanks in advance.

gmaxey
10-20-2016, 05:09 AM
While strFile <> ""
Set oDoc = Documents.Open(strPath & strFile)
NextRow = xlSheet.Range("B" & xlSheet.Rows.Count).End(-4162).Row + 1
xlSheet.Cells(NextRow, 1) = oDoc.FullName
Set oRng = oDoc.Range
With oRng.Find
Do While .Execute(FindText:="PMID")
oRng.End = oRng.End + 9
NextRow = xlSheet.Range("B" & xlSheet.Rows.Count).End(-4162).Row + 1
xlSheet.Cells(NextRow, 2) = oRng.Text
oRng.Collapse 0
Loop
End With
xlBook.Save
oDoc.Close 0
DoEvents
strFile = Dir$()
Wend

gentle2005
10-20-2016, 06:48 AM
Thank you. How and where to put these code? I am sorry I know only small tweaks. Please guide me.

gmaxey
10-20-2016, 10:22 AM
In Mr. Mayor's original code, you replace the While Wend segment with the While Wend provided above.

gentle2005
10-21-2016, 10:13 PM
Thank you gmaxey. But gmaxey code works perfectly with me instead of gmayor. The only addition reqd is to have another column in excel to print file name from which text+9 char are taken. I have nearly 250 word file and each word file contain 3-4 instances of text+9char.

FOr Example:

Column A column B

PMID+9char c:\red
PMID+9char C:\red
PMID+9char C:blue
PMID+9char c:\green
PMID+9char C:\green
PMID+9char C:\green

Thanks

gmaxey
10-22-2016, 05:59 AM
My method doesn't work any better than Mr. Mayor's. This is not a free code writing service. If you want help (from me) in the future then you are going to have to at least show some effort.


Const strWorkbook As String = "D:\Book1.xlsx" 'Change the file path to match the location of your EXISTING workbook
Sub ExtractPartsToExcel()
Dim strPath As String, strFile As String, strText As String
Dim arrText() As String, arrParts() As String
Dim lngIndex As Long, lngPart As Long
Dim oDoc As Document
Dim oDialog As FileDialog

'Get folder containing files to process.
Set oDialog = Application.FileDialog(msoFileDialogFolderPicker)
With oDialog
.Title = "Select folder to process and click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User", , "List Folder Contents"
Exit Sub
End If
strPath = oDialog.SelectedItems.Item(1)
If Right(strPath, 1) <> Application.PathSeparator Then strPath = strPath + Application.PathSeparator
End With
'Process .doc, .docx or .docm files only.
strFile = Dir$(strPath & "*.doc*")
lngPart = 0
While strFile <> ""
Set oDoc = ActiveDocument 'Documents.Open(strPath & strFile, , , False)
strText = oDoc.Range.Text
'Split the text into parts delimted by the text "PMID"
arrText = Split(strText, "PMID")
'Build an array of parts consisting of the the first 9 characters of the individual parts.
For lngIndex = 1 To UBound(arrText)
ReDim Preserve arrParts(1, lngPart)
arrParts(0, lngPart) = "PMID" & Left(arrText(lngIndex), 9)
arrParts(1, lngPart) = oDoc.FullName '"PMID" & Left(arrText(lngIndex), 9)
lngPart = lngPart + 1
Next lngIndex
oDoc.Close wdDoNotSaveChanges
DoEvents
strFile = Dir$()
Wend
'Write the parts to Excel
WriteToExcel arrParts
lbl_Exit:
Set oDoc = Nothing
Exit Sub
End Sub
Sub WriteToExcel(ByRef arrPassed As Variant)
Dim oApp As Object, oBook As Object, oSheet As Object
Dim lngNextRow As Long
If Not IsArray(arrPassed) Then Exit Sub
On Error Resume Next
Set oApp = GetObject(, "Excel.Application")
If Err <> 0 Then Set oApp = CreateObject("Excel.Application")
On Error GoTo 0
With oApp
.Visible = True
Set oBook = .Workbooks.Open(strWorkbook)
End With
Set oSheet = oBook.Sheets("Sheet1")
lngNextRow = oSheet.Range("A" & oSheet.Rows.Count).End(-4162).Row + 1
oBook.Sheets(1).Range("A" & lngNextRow & ":B" & lngNextRow).Resize(UBound(arrPassed) - LBound(arrPassed) + 1).Value = oApp.Transpose(arrPassed)
lbl_Exit:
Set oApp = Nothing: Set oBook = Nothing: Set oSheet = Nothing
Exit Sub
End Sub

gentle2005
10-27-2016, 02:54 AM
Mr.gmaxey , I am new to VBA but I have followed what you have said,"I put little effort to try and understand Mr.gmayor code" got it done. Thanks for the great help. can you give me your advice to learn VBA for excel becoz I used excel day in and day out?

h2whoa
10-28-2016, 01:42 AM
I'm trying to extract the word "PMID" as well as the 9 characters following

Off topic, but I just wanted to say hi to a fellow PubMed user, I guess!