PDA

View Full Version : VBA field extraction from a Word Document



Charlie_Fla
09-12-2011, 11:18 AM
I need some help in writing some VBA code to extact some information from a Word Document. I can manipulate the data once I get it in a string field, but am having difficulty scaning a Word Document to locate some text and then stuffing them into string variables.

Specifically, I would like to do the following:

1. Find the first line of text and set that to string variable sTitle. It will usually be the first line of text (after the page header), but might have some blank lines before it.

2. Scan the document for "Patient Name:" and then set string variable sName to whatever follows. Would probably like to set sNameLast to the last word of that string, and sNameFirst to whatever is before sNameLast.

3. Similarly, scan the document for "Patient Number:" and then set string variable sNumber to the character field that follows (formatted includes a hypen, and should be 10 characters long).

What happens next is to insert a signature image (which I have done), and "print" to a PDF file, developing the filename from the extracted string variables (which I believe I can do once the string fields get set).

Thanks for your help!

Frosty
09-12-2011, 02:19 PM
This is proof-of-concept code... but there are many many ways to approach this. The following code makes some serious assumptions, since you weren't terribly specific in your "I'd like to scan the document" direction.

Assumption #1 your Patient Name: and Patient Number: are in the same paragraph, and that there is no other text in those paragraphs apart from the info you really want
Assumption #2 none of your patient's are named "Mr. John Smith, Jr." (or any derivative of additional designations... "John Jacob Smith" will sort of work, in that "John" will end up being the first name, and "Jacob Smith" will end up being treated as the last name).

In general, recording macros to play with the Find object can be really helpful... as you may be able to add in additional criteria to make sure you don't have erroneous results.

Try the following code out on one of your documents, and see how it goes. Obviously, you don't need to use msgbox stuff once you start getting close to what you want.

Sub DemoFindStuff()
Dim oPara As Paragraph
Dim sTitle As String
Dim sName As String
Dim sNameFirst As String
Dim sNameLast As String
Dim sPatientNumber As String
Dim rngFound As Range

For Each oPara In ActiveDocument.Paragraphs
If oPara.Range.text <> vbCr Then
sTitle = oPara.Range.text
'want to get rid of the paragraph mark?
sTitle = Replace(sTitle, vbCr, "")
MsgBox sTitle
Exit For
End If
Next

Set rngFound = FindSomething("Patient Name:")
If Not rngFound Is Nothing Then
'if we found it, let's manipulate it here--this is a big assumption to assume it's the paragraph
sName = rngFound.Paragraphs(1).Range.text
'get rid of our search term, and any trailing/leading blankspaces
sName = Trim(Replace(sName, "Patient Name:", ""))
'some more manipulation -- requires some heavier coding, or assuming that you don't have any
'patients named "Mr. John Smith, Jr."
sName = Replace(sName, vbCr, "")
'with our above assumptions, this takes care of simple first name-space-last name scenarios
sNameFirst = Left(sName, InStr(sName, " ") - 1)
sNameLast = Right(sName, Len(sName) - InStr(sName, " "))

MsgBox sNameFirst & vbCr & sNameLast
Set rngFound = Nothing
End If

'patient number might be easier
Set rngFound = FindSomething("Patient Number:")
If Not rngFound Is Nothing Then
sPatientNumber = rngFound.Paragraphs(1).Range.text
sPatientNumber = Trim(Replace(sPatientNumber, "Patient Number:", ""))
sPatientNumber = Replace(sPatientNumber, vbCr, "")
MsgBox sPatientNumber
End If
End Sub
Function FindSomething(sWhat As String) As Range
Dim rngSearch As Range
Set rngSearch = ActiveDocument.Content
'find patient name
With rngSearch.Find
.text = sWhat
.Execute
If .Found Then
Set FindSomething = rngSearch
End If
End With
End Function

Talis
09-13-2011, 01:25 PM
Frosty writes:
"John Jacob Smith" will sort of work, in that "John" will end up being the first name, and "Jacob Smith" will end up being treated as the last name).

To avoid this, replace the lines:

sNameFirst = Left(sName, InStr(sName, " ") - 1)
sNameLast = Right(sName, Len(sName) - InStr(sName, " "))

with

sNameLast = LastName(sName)
sNameFirst = Trim(Replace(sName, sNameLast, ""))

and add the following function

Public Function LastName(ByVal strName As String) As String
While InStr(strName, " ") > 0
strName = Right(strName, Len(strName) - InStr(strName, " "))
Wend
LastName = strName
End Function

Frosty
09-13-2011, 01:41 PM
Talis,

You can try InStrRev instead of a While...Wend loop using InStr. For example, your loop can be replaced with the following line:

LastName = Right(strName, Len(strName) = InStrRev(strName, " "))

Of course, there are literally hundreds of ways to approach string manipulation. But neither of the approaches solve the "John Smith, Jr." problem. And to be "safe" with this kind of input, the liberal use of Trim is also suggested, since "John Smith " will cause an empty "last name"

Hope that helps!

Talis
09-15-2011, 11:32 AM
Thanks Frosty,
I was unaware of the built-in function InStrRev.

For the benefit of the OP here's the code as it now stands:

Sub DemoFindStuff()
Dim oPara As Paragraph
Dim sTitle As String
Dim sName As String
Dim sNameFirst As String
Dim sNameLast As String
Dim sPatientNumber As String
Dim rngFound As Range

For Each oPara In ActiveDocument.Paragraphs
If oPara.Range.text <> vbCr Then
sTitle = oPara.Range.text
'want to get rid of the paragraph mark?
sTitle = Replace(sTitle, vbCr, "")
MsgBox sTitle
Exit For
End If
Next

Set rngFound = FindSomething("Patient Name:")
If Not rngFound Is Nothing Then
'if we found it, let's manipulate it here--this is a big assumption to assume it's the paragraph
sName = rngFound.Paragraphs(1).Range.text
'get rid of our search term, and any trailing/leading blankspaces
sName = Trim(Replace(sName, "Patient Name:", ""))
'some more manipulation -- requires some heavier coding, or assuming that you don't have any
'patients named "Mr. John Smith, Jr."
sName = Replace(sName, vbCr, "")
'with our above assumptions, this takes care of simple first name-space-last name scenarios

sNameLast = Trim(Right(sName, Len(sName) - InStrRev(sName, " ")))
sNameFirst = Trim(Replace(sName, sNameLast, ""))

MsgBox sNameFirst & vbCr & sNameLast
Set rngFound = Nothing
End If

'patient number might be easier
Set rngFound = FindSomething("Patient Number:")
If Not rngFound Is Nothing Then
sPatientNumber = rngFound.Paragraphs(1).Range.text
sPatientNumber = Trim(Replace(sPatientNumber, "Patient Number:", ""))
sPatientNumber = Replace(sPatientNumber, vbCr, "")
MsgBox sPatientNumber
End If
End Sub
Function FindSomething(sWhat As String) As Range
Dim rngSearch As Range
Set rngSearch = ActiveDocument.Content
'find patient name
With rngSearch.Find
.text = sWhat
.Execute
If .Found Then
Set FindSomething = rngSearch
End If
End With
End Function

The onus of solving the , Jnr., etc problem will have to reside with the data input process.