Consulting

Results 1 to 14 of 14

Thread: Import Words and Phrases from MS Word to Excel

  1. #1
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location

    Smile Import Words and Phrases from MS Word to Excel

    Hi All,

    I am someone who can only copy and modify a VBA but can not build it from scratch. I have more than 3000 MSWord docs stored different sub-folders. All of them are in same format.

    I have been trying to look for a VBA that would extract data from MsWord to Excel using the keywords before and after the target data - Something to what this youtube video is showing https://www.youtube.com/watch?v=1x-Vk4Qmpz0&t= only that I don't have it in a Word table.

    I realize that there may be a lot of similar threads like this but nothing seems to work for my case since my documents are not in a table format.

    I was hoping someone might be able to walk me through this one. I have attached a sample doc and the target excel file. The data that needed to be exported are highlighted in yellow.

    Thank you and stay safe everyone!

    sample doc.docxsample xls.xlsx

    Last edited by yarixen; 05-12-2020 at 07:04 AM.

  2. #2
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,121
    Location
    Your sample document appears to have had some content deleted - it even refers to an 'attached two pages' - and has also had numerous strings of nothing more than spaces inserted. The document cannot be processed without knowing what other content it has that might affect the data extraction.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  3. #3
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location
    Thanks Paul, for looking into this. I apologize that I removed most part of the document in my poor attempt to keep from being public. There are 3 (or sometimes 4 pages) in each document. The string of spaces are consistently present in the documents - Please see if it can be ignored. I was wondering if there was a way text can be extracted based on words prior and subsequent to it. Here I attached a new xls and doc.
    Attached Files Attached Files

  4. #4
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,121
    Location
    Try the following Excel macro. Provided your documents all have a layout consistent with the one you attached, the data should be extracted from all documents in the folder you select.
    Sub GetPermitData()
    'Note: this code requires a reference to the Word object model. See under the VBE's Tools|References.
    Application.ScreenUpdating = False
    Dim strFolder As String, strFile As String, strData As String
    Dim WkSht As Worksheet, r As Long, c As Long
    strFolder = GetFolder: If strFolder = "" Then Exit Sub
    Dim wdApp As New Word.Application, wdDoc As Word.Document
    Set WkSht = ActiveSheet
    r = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
    'Disable any auto macros in the documents being processed
    wdApp.WordBasic.DisableAutoMacros
    strFile = Dir(strFolder & "\*.doc", vbNormal)
    While strFile <> ""
      r = r + 1
      Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
      On Error GoTo ErrExit
      With wdDoc
        Do While .InlineShapes.Count > 0
          .InlineShapes(1).Delete
        Loop
        Do While .Shapes.Count > 0
          .Shapes(1).Delete
        Loop
        With .Range
          .Paragraphs.First.Range.Text = vbCr
          With .Find
            .ClearFormatting
            .Replacement.ClearFormatting
            .Forward = True
            .Format = False
            .Wrap = wdFindContinue
            .MatchWildcards = True
            .Text = "^13*Permit No.:(*)[ ]@Date Issued:(*)^13*issued to[^13]{1,}(*)^13*Located at (*)^13*:[ ^13]{1,}"
            .Replacement.Text = "^t\1^t\2^t\3^t\4^t"
            .Execute Replace:=wdReplaceAll
            .Text = "[^13 ]@[!^13]@valid until[ ]@<(*).^13*\(PCO\)(*) shall be*Regional Director*[ ^13]{1,}"
            .Replacement.Text = "^t\1^t\2"
            .Execute Replace:=wdReplaceAll
            .Text = "^13[^13 ]{1,}"
            .Replacement.Text = Chr(182)
            .Execute Replace:=wdReplaceAll
            .Text = "([^t" & Chr(182) & "])[ ]{1,}"
            .Replacement.Text = "\1"
            .Execute Replace:=wdReplaceAll
            .Text = "[ ]{1,}([^t" & Chr(182) & "])"
            .Replacement.Text = "\1"
            .Execute Replace:=wdReplaceAll
            .Text = "(^t*)(^t*)(^t*)(^t*)(^t*)(^t*)(^t*)"
            .Replacement.Text = "\3\4\5\1\2\6\7"
            .Execute Replace:=wdReplaceAll
          End With
        End With
        strData = Split(.Range.Text, vbCr)(0)
        .Close SaveChanges:=False
      End With
      For c = 1 To UBound(Split(strData, vbTab))
        WkSht.Cells(r, c).Value = Split(strData, vbTab)(c)
      Next
      strFile = Dir()
    Wend
    GoTo NoErr
    ErrExit:
    
    MsgBox "Cannot process:" & vbcr & strFile & vbcr & "Exiting", vbCritical
    NoErr: wdApp.Quit WkSht.UsedRange.Replace What:=Chr(182), Replacement:=Chr(10), LookAt:=xlPart, SearchOrder:=xlByRows Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing Application.ScreenUpdating = True End Sub ' Function GetFolder() As String Dim oFolder As Object GetFolder = "" Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0) If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path Set oFolder = Nothing End Function


    Last edited by macropod; 05-16-2020 at 01:15 AM.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  5. #5
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location
    Thank you Sir. I will get to check it on actual docs when I get back to office. Please let me get back to you if I need help again. I really appreciate it. Stay safe.

  6. #6
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location
    Hi Paul, can you help me check this error:
    ERR1.PNG

    pressing Debug goes here:

         .Paragraphs.First.Range.Text = vbCr

  7. #7
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,121
    Location
    Don't know why you'd be getting that error; there's nothing wrong with the code line. Try restarting Windows.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  8. #8
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location
    Thanks for the reply. Did the restart but still encountering it in a lot of folders. I checked what seems to be different with these folders - My suspicion is, some of the folders contains older docs that have this textbox/autoshape in their header like this :

    textbox in header.PNG

    When it encounters one of these old files (with textbox ) it stops checking the rest of the other docs and gives the error earlier posted.

    Do you think this is stopping the run? Is there a way to ignore these textboxes? I am attaching another doc sample with this text box. Kindly see.

    Thank you for all the help.
    Attached Files Attached Files

  9. #9
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,121
    Location
    I have revised the code in post #4 to eliminate such objects.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  10. #10
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location
    Hi Paul, thank you. Your revision works. But I still get errors and it stops running - my documents are the problem. I think there are other files with variations that is hard for me to check one by one as of the moment. Do you think it is better doing extraction from a .txt notepad? I can batch convert all of them to .txt to keep the text only.

  11. #11
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,121
    Location
    I very much doubt changing the file format would make any difference. The crucial thing is the file content, not its format. If you have non-conforming documents - as seems to be the case - you can either:
    • move those documents out of the folder being processed; or
    • correct those documents.
    I have further edited the code in post #4 to report the file that causes the macro to stop.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  12. #12
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location
    Hi Paul, I just want to say thanks. Some had to be done manually, but its done now.

    Thanks to your code it saved me a lot of time. I just wish I knew how to edit those things (wildcards, special characters, ascii?, I don't know what they are called) . If you can point me to a link where I can learn more about "these", I'd appreciate it.

    As always, stay safe.

  13. #13
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,121
    Location
    They're wildcard expressions. See: https://wordmvp.com/FAQs/General/UsingWildcards.htm
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  14. #14
    VBAX Regular
    Joined
    May 2020
    Posts
    8
    Location
    I really appreciate this!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •