Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 43

Thread: Import text from word document into excel

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    VBAX Regular
    Joined
    Jul 2012
    Posts
    9
    Location

    Exclamation Import text from word document into excel

    Hi... Dear all

    I am struggling with finding code to help me extract text from word documents into an excel sheet...


    Basically: I have 300 word documents that contain the same data for different individuals.. I need to come up with a code that would access the word documents one at a time, search the word document for certain words (e.g. "NAME" or "DOB") then extract what lies next to that name back into a field in excel.

    I really am lost.. and was hoping to get some quick pointers!!

    any help much appreciated..
    what goes around comes around

    thanks

  2. #2
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Do you want to run this from Word or Excel (I'd suggest Excel)?

    In the Word documents, are the required data in specified locations (eg formfields, particular cells in a table, etc), could there be other text, etc in the same paragraphs?
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  3. #3
    VBAX Regular
    Joined
    Jul 2012
    Posts
    9
    Location

    Red face Thanks Macropod

    Thanks macropod for replying back..

    Basically:
    I have around 300 radiology reports on 300 different patients. The data in each report is pretty standard in all, but I inherited all these reports in word documents which were not formated into fields or delimited or had coma identifiers extra..
    Just as an example:

    Name: XXXXX
    DOB: XXXXXX
    Address: XXXXXX

    LVEDD=XXXX
    LVESS=KKKK

    TECHNIQUE:
    1. YYYYYY
    2. HHHHHHh

    So what I basically want to do, is try and parse/extract the data for the different patients and insert it in a database for later processing.

    It will take me ages to input manually and I was hoping to come up with some way to automate the whole process...

    Many thanks for all the help...
    PS: am still a medical student, and am a little bit challeneged when it comes to automating processes... but all the help I get from u guyz much appreciated!!!

  4. #4
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Hi mayseed,

    It would be helpful if you could post a sample document (no sensitive data) so that I could see the complete document structure.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  5. #5
    VBAX Regular
    Joined
    Jul 2012
    Posts
    9
    Location

    Red face

    Thanks for all your help!! Just knowing ur keen to help made my day.. I attached a sample report.. basically all reports are structured similarly. What I want to extract is the technique, different numbers.. the measurements are always preceded by the same word (i.e. ESD = xxx) and so on...

    thanks once again!
    Attached Files Attached Files

  6. #6
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

    As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.
    Attached Files Attached Files
    Last edited by macropod; 07-08-2012 at 06:05 PM.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  7. #7
    VBAX Regular
    Joined
    Jul 2012
    Posts
    9
    Location

    Talking

    I am so grateful for u macropod.. i wish i can repay u for this.. this is brilliant.. so its working nicely... i am going to go through the code and see if i can adjust it..
    when it comes to the normal ranges i dont really need them.. all i need is the actual number..

    thats all..
    ill keep u posted with progress...

    U MADE MY DAY!! I want to cry!!
    thanks!

  8. #8
    VBAX Regular
    Joined
    Jul 2012
    Posts
    9
    Location

    Red face

    Hi again macropod!

    I tried dissecting the code and was trying to add some more fields but just couldnt figure out how to make it identify the different numbers in the document..

    i only need to extract the numbers (can ignore the normal range), some numbers are preceeded by "=" others are preceded by "-".

    Am not quite sure what to do

    thanks for ur help
    ur awesome

  9. #9
    Thanks !!

  10. #10
    VBAX Newbie
    Joined
    Nov 2013
    Posts
    1
    Location
    Dear macropod, the macro you have posted is exactly something I am going for. However, the 'fields' in my document are not as clearly labelled and I haven't had much luck altering the code. Would you be kind enough to help me get just the first field sorted and I will give the rest a go myself?

    attach.jpg

    Would it be possible for it to just extract the bits in the rectangles (which aren't fields so seems quite difficult to extract from them)
    Last edited by iliauk; 11-08-2013 at 06:12 AM.

  11. #11
    VBAX Newbie
    Joined
    Oct 2014
    Posts
    1
    Location
    Quote Originally Posted by macropod View Post
    See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

    As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.
    Thanks Macropod. I have a need to read word document and posting this to be able to download the attached code with the hope that I can get information I need to update my macro.

  12. #12
    Quote Originally Posted by macropod View Post
    See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

    As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.
    Hello macropod. Thank you very much for the head start on moving things between Word and Excel within VBA. I'm fairly good with VBA in Excel but am new to interacting with Word. I can't figure out how one step of your code is working. Its on line 87 and states, "StrTxt = .Duplicate.Text" . When I set a breakpoint around this area and work through it while keeping a watch on both "StrTxt" and "wdDoc.Range.Duplicate.Text" they turn out to be very different strings both before and after the assignment to StrTxt. It appears "wdDoc.Range.Duplicate.Text" shows the unfiltered or unfound text (the whole document contents), but when it is assigned to StrTxt, it shows the properly found value. MAGIC! Am I misunderstanding some With statements you are using and not watching the proper variable or what?

    All of the other sample code and documentation I've found shows how to find/replace some string but I haven't found anything that actually returns the found string until this. Why does line 87 work the way it does?

    A follow up, that may be answered from the first, is why do you use the Duplicate property? What advantage does this have over using just the Range.Text? Both seem to work the magic described above.

  13. #13
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    In Word, the .Duplicate property allows you to vary a copy of a range without disturbing the original range (in this case, the range found by the Find - not the whole document). Since nothing to do with how StrTxt is populated involves manipulating the found range, .Duplicate could probably have been omitted in this case.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  14. #14
    macropod,

    Thanks for your reply. Understood on the .Duplicate part, but I'm still confounded with what appears to be misunderstanding the With statement structure...unless I'm completely off the mark. I've uploaded a screenshot of what I'm encountering. All With statements above the screenshot appear to be ended up to this point, so I would think the following would be true (but they appear not to be true according to the the Watches List).

    1.) The wdDoc.Range.Find.Found = True since we are now within the If statement where that was the condition. (The watches list tells me this is false though)
    2.) StrTxt = wdDoc.Range.Duplicate.Text Though you can also see in the watches list, that this is not the case

    Capture.jpg

    Am I making a noob mistake and misunderstanding the With statement structure or what? Hoping for a poke in the right direction.

    I've commented out quite a few things from the original code, as they are not applicable to what I'm trying to do, but I don't see any reason they would affect the "magic" going on here. But just in case, I've pasted it below.

    Sub UpdateData()
    Application.ScreenUpdating = False
    Dim StrFolder As String, StrFile As String, StrFnd As String, StrTxt As String
    Dim wdApp As Object, wdDoc As Object, bStrt As Boolean
    Dim WkSht As Worksheet, LRow As Long, LCol As Long, i As Long
    Const wdFindContinue As Long = 1, wdReplaceAll As Long = 2
    StrFolder = "F:\Test\"
    If StrFolder = "" Then Exit Sub
    Set WkSht = ThisWorkbook.Sheets("Sheet1")
    LRow = WkSht.Cells.SpecialCells(xlCellTypeLastCell).Row
    LCol = WkSht.Cells.SpecialCells(xlCellTypeLastCell).Column
    ' Test whether Word is already running.
    'On Error Resume Next
    bStrt = False ' Flag to record if we start Word, so we can close it later.
    Set wdApp = GetObject(, "Word.Application")
    'Start Excel if it isn't running
    If wdApp Is Nothing Then
      Set wdApp = CreateObject("Word.Application")
      If wdApp Is Nothing Then
        MsgBox "Can't start Word.", vbExclamation
        Exit Sub
      End If
      ' Record that we've started Excel.
      bStrt = True
    End If
    On Error GoTo 0
    StrFile = Dir(StrFolder & "\*.doc", vbNormal)
    While StrFile <> ""
      LRow = LRow + 1
      Set wdDoc = wdApp.Documents.Open(Filename:=StrFolder & "\" & StrFile, AddToRecentFiles:=False, Visible:=False, ReadOnly:=True)
      'Do some pre-processing cleanup
      With wdDoc.Content.Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
        'Replace all tabs with single spaces
        '.Execute Replace:=wdReplaceAll
        '.Text = "[^t]{1,}"
        '.Replacement.Text = " "
        'Replace all double spaces with single spaces
        '.Execute Replace:=wdReplaceAll
        '.Text = "[ ]{2,}"
        '.Replacement.Text = " "
        'Clear out spaces before/after paragraph breaks
        '.Text = " [^13]{1,}"
        '.Replacement.Text = "^p"
        '.Execute Replace:=wdReplaceAll
        '.Text = "[^13]{1,} "
        '.Replacement.Text = "^p"
        '.Execute Replace:=wdReplaceAll
        'Limit paragraph breaks and manual line breaks to one 'real' paragraph per set
        '.Text = "[^13^11]{1,}"
        '.Replacement.Text = "^p"
        '.Execute Replace:=wdReplaceAll
        'Insert extra paragraph breaks before paragraphs. This is to facilitate data extraction
        '.Text = "^13[!^13]{1,}"
        '.Font.Bold = True
        '.Replacement.Text = "^p^&"
        '.Execute Replace:=wdReplaceAll
        '.Text = ""
        '.MatchWildcards = False
        '.Execute Replace:=wdReplaceAll
      End With
      'Get the data for each defined Excel column
      For i = 1 To LCol
        'StrFnd = WkSht.Cells(1, i).Value
        With wdDoc.Range
          With .Find
            .ClearFormatting
            .Text = "P[0-9]{5,6}"
            .Replacement.Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Format = False
            .MatchWildcards = True
            .Execute
          End With
          If .Find.Found = True Then
            'Parse the data
            StrTxt = .Duplicate.Text
            If InStr(StrTxt, ":") > 0 Then
              StrTxt = Trim(Mid(StrTxt, InStr(StrTxt, ":") + 1, Len(StrTxt)))
            ElseIf InStr(StrTxt, "=") > 0 Then
              StrTxt = Trim(Mid(StrTxt, InStr(StrTxt, "=") + 1, Len(StrTxt)))
            End If
            'Update Excel
            WkSht.Cells(LRow, i).Value = StrTxt
          End If
        End With
      Next
      wdDoc.Close SaveChanges:=False
      StrFile = Dir()
    Wend
    If bStrt = True Then wdApp.Quit
    Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
    Application.ScreenUpdating = True
    End Sub
    
    Function GetFolder() As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
    End Function

  15. #15
    VBAX Newbie
    Joined
    Jun 2017
    Posts
    1
    Location

    Quote Originally Posted by macropod View Post
    See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

    As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.

  16. #16
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    What to do depends on what you want to achieve. The code in the workbook I provided is already equipped to handle data preceded by ':' and '='. You just need to supply the prefixes. If you look where the code processes these, I'm sure you'll quickly figure out how to add the test for '-'.

    As you haven't answered the questions I asked in my last post, I can't really provide more specific advice.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  17. #17
    VBAX Regular
    Joined
    Jul 2012
    Posts
    9
    Location
    Thank you macropod...

    I tried to use the code to find the EDV, ESV values these use "=" symbol.. i did that by adding fields to the table containing the exact strings as spelled in the word document.. but it still wouldn't work..

    As for the data in the multiple paragraphs:
    for example.. the technique field: then the statements (4 in the sample file) need to be inserted into 4 different fields in the table.

    Again when I add "technique" to the table and then run the code, I only end up with the statement that follows the word technique.. the 4 other statements which occur on different lines get missed.. Any ideas on how to fix that?

    THANK U MACROPOD... I am almost there with getting the code to work.. all thanks to u...

  18. #18
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Hi mayseed,

    Try the attached. Do note there is a limit to what can be achieved with what you're working with. If your documents differ significantly from what you've posted, much of the code I've written could be invalidated - and coding around the difficulties might not be practical/possible.

    One thing to note is that I've assumed the doctor's name & title at the end (which I assume you don't want) always span two paragraphs. They get deleted during processing (the changes aren't saved) so that they don't get included with the extracted data.

    Quote Originally Posted by mayseed
    As for the data in the multiple paragraphs:
    for example.. the technique field: then the statements (4 in the sample file) need to be inserted into 4 different fields in the table.
    I don't think it's practical to do that, as you'll never know how many columns might be involved. For now, the code just puts them all in the one cell.
    Attached Files Attached Files
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  19. #19
    VBAX Regular
    Joined
    Jul 2012
    Posts
    9
    Location

    Exclamation THANK YOU SO MUCH!!

    THIS WORKS PERFECTLY.. I WANT TO CRY..

    THANK YOU SO MUCH MACROPOD THIS IS AMAZING.

  20. #20
    VBAX Newbie
    Joined
    Jun 2013
    Posts
    4
    Location

    Hi, sorry to jump on the end of this thread, but it seems to be almost whatI am looking form. Is it possible to use this to extract information from thecell to the right of the found word (from a table in word)? Rather than thenext tab?

    Any help would be appreciated

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •