Consulting

Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 21 to 40 of 43

Thread: Import text from word document into excel

  1. #21
    VBAX Newbie
    Joined
    Sep 2015
    Posts
    1
    Location
    Is there a way to import content from only certain 'styles' of data?

    I have a document that utilizes a table of contents and then has 'Req#' under each section of the document. I would like to pull out the 'Req#' and the following content and organize them likewise into a spreadsheet.

    I have looked over the code presented, but it does not increment through out the selected document.

    Thanks

  2. #22
    Thanks!

  3. #23
    Quote Originally Posted by macropod View Post
    See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

    As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.
    Hello macropod. Thank you very much for the head start on moving things between Word and Excel within VBA. I'm fairly good with VBA in Excel but am new to interacting with Word. I can't figure out how one step of your code is working. Its on line 87 and states, "StrTxt = .Duplicate.Text" . When I set a breakpoint around this area and work through it while keeping a watch on both "StrTxt" and "wdDoc.Range.Duplicate.Text" they turn out to be very different strings both before and after the assignment to StrTxt. It appears "wdDoc.Range.Duplicate.Text" shows the unfiltered or unfound text (the whole document contents), but when it is assigned to StrTxt, it shows the properly found value. MAGIC! Am I misunderstanding some With statements you are using and not watching the proper variable or what?

    All of the other sample code and documentation I've found shows how to find/replace some string but I haven't found anything that actually returns the found string until this. Why does line 87 work the way it does?

    A follow up, that may be answered from the first, is why do you use the Duplicate property? What advantage does this have over using just the Range.Text? Both seem to work the magic described above.

  4. #24
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    In Word, the .Duplicate property allows you to vary a copy of a range without disturbing the original range (in this case, the range found by the Find - not the whole document). Since nothing to do with how StrTxt is populated involves manipulating the found range, .Duplicate could probably have been omitted in this case.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  5. #25
    macropod,

    Thanks for your reply. Understood on the .Duplicate part, but I'm still confounded with what appears to be misunderstanding the With statement structure...unless I'm completely off the mark. I've uploaded a screenshot of what I'm encountering. All With statements above the screenshot appear to be ended up to this point, so I would think the following would be true (but they appear not to be true according to the the Watches List).

    1.) The wdDoc.Range.Find.Found = True since we are now within the If statement where that was the condition. (The watches list tells me this is false though)
    2.) StrTxt = wdDoc.Range.Duplicate.Text Though you can also see in the watches list, that this is not the case

    Capture.jpg

    Am I making a noob mistake and misunderstanding the With statement structure or what? Hoping for a poke in the right direction.

    I've commented out quite a few things from the original code, as they are not applicable to what I'm trying to do, but I don't see any reason they would affect the "magic" going on here. But just in case, I've pasted it below.

    Sub UpdateData()
    Application.ScreenUpdating = False
    Dim StrFolder As String, StrFile As String, StrFnd As String, StrTxt As String
    Dim wdApp As Object, wdDoc As Object, bStrt As Boolean
    Dim WkSht As Worksheet, LRow As Long, LCol As Long, i As Long
    Const wdFindContinue As Long = 1, wdReplaceAll As Long = 2
    StrFolder = "F:\Test\"
    If StrFolder = "" Then Exit Sub
    Set WkSht = ThisWorkbook.Sheets("Sheet1")
    LRow = WkSht.Cells.SpecialCells(xlCellTypeLastCell).Row
    LCol = WkSht.Cells.SpecialCells(xlCellTypeLastCell).Column
    ' Test whether Word is already running.
    'On Error Resume Next
    bStrt = False ' Flag to record if we start Word, so we can close it later.
    Set wdApp = GetObject(, "Word.Application")
    'Start Excel if it isn't running
    If wdApp Is Nothing Then
      Set wdApp = CreateObject("Word.Application")
      If wdApp Is Nothing Then
        MsgBox "Can't start Word.", vbExclamation
        Exit Sub
      End If
      ' Record that we've started Excel.
      bStrt = True
    End If
    On Error GoTo 0
    StrFile = Dir(StrFolder & "\*.doc", vbNormal)
    While StrFile <> ""
      LRow = LRow + 1
      Set wdDoc = wdApp.Documents.Open(Filename:=StrFolder & "\" & StrFile, AddToRecentFiles:=False, Visible:=False, ReadOnly:=True)
      'Do some pre-processing cleanup
      With wdDoc.Content.Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
        'Replace all tabs with single spaces
        '.Execute Replace:=wdReplaceAll
        '.Text = "[^t]{1,}"
        '.Replacement.Text = " "
        'Replace all double spaces with single spaces
        '.Execute Replace:=wdReplaceAll
        '.Text = "[ ]{2,}"
        '.Replacement.Text = " "
        'Clear out spaces before/after paragraph breaks
        '.Text = " [^13]{1,}"
        '.Replacement.Text = "^p"
        '.Execute Replace:=wdReplaceAll
        '.Text = "[^13]{1,} "
        '.Replacement.Text = "^p"
        '.Execute Replace:=wdReplaceAll
        'Limit paragraph breaks and manual line breaks to one 'real' paragraph per set
        '.Text = "[^13^11]{1,}"
        '.Replacement.Text = "^p"
        '.Execute Replace:=wdReplaceAll
        'Insert extra paragraph breaks before paragraphs. This is to facilitate data extraction
        '.Text = "^13[!^13]{1,}"
        '.Font.Bold = True
        '.Replacement.Text = "^p^&"
        '.Execute Replace:=wdReplaceAll
        '.Text = ""
        '.MatchWildcards = False
        '.Execute Replace:=wdReplaceAll
      End With
      'Get the data for each defined Excel column
      For i = 1 To LCol
        'StrFnd = WkSht.Cells(1, i).Value
        With wdDoc.Range
          With .Find
            .ClearFormatting
            .Text = "P[0-9]{5,6}"
            .Replacement.Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Format = False
            .MatchWildcards = True
            .Execute
          End With
          If .Find.Found = True Then
            'Parse the data
            StrTxt = .Duplicate.Text
            If InStr(StrTxt, ":") > 0 Then
              StrTxt = Trim(Mid(StrTxt, InStr(StrTxt, ":") + 1, Len(StrTxt)))
            ElseIf InStr(StrTxt, "=") > 0 Then
              StrTxt = Trim(Mid(StrTxt, InStr(StrTxt, "=") + 1, Len(StrTxt)))
            End If
            'Update Excel
            WkSht.Cells(LRow, i).Value = StrTxt
          End If
        End With
      Next
      wdDoc.Close SaveChanges:=False
      StrFile = Dir()
    Wend
    If bStrt = True Then wdApp.Quit
    Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
    Application.ScreenUpdating = True
    End Sub
    
    Function GetFolder() As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
    End Function

  6. #26
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    What you need to understand is that .Find.Execute changes the range from the document as a whole to the .Find.Found range.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  7. #27
    Thank you for your continued support macropod.

    The range you are talking about is wdDoc.Range right? That would make sense to me, but it still doesn't appear to show up like that in the Watches List, which is what confuses me. Take the same screenshot for example; I just stepped through the "StrTxt = .Duplicate.Text" line of code, so I would assume the Watches on both StrTxt and wdDoc.Range.Duplicate.Text would be the same right? But they are not. Is .Duplicate.Text referring to something other than wdDoc.Range.Duplicate.Text at this point? Or am I watching the wrong full-context variable/parameter?

  8. #28
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Quote Originally Posted by topher217 View Post
    The range you are talking about is wdDoc.Range right?
    I discussed two ranges - the first being what you're referring to as wdDoc.Range, the second being the range derived from wdDoc.Range.Find.Execute. It is that second range to which the (unnecessary) .Duplicate refers and from which StrTxt is populated.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  9. #29
    I discussed two ranges - the first being what you're referring to as wdDoc.Range, the second being the range derived from wdDoc.Range.Find.Execute. It is that second range to which the (unnecessary) .Duplicate refers and from which StrTxt is populated.
    Is there a way to Watch the fully qualified expression for the range derived from wdDoc.Range.Find.Execute? (wdDoc.Range.Find.Execute only evaluates to a Boolean) .

    What you need to understand is that .Find.Execute changes the range
    <----- "the range" refers to which range?

    According to my watches list wdDoc.Range.Text does not change from the document to found range as I understood from your previous explanation. Therefore I'm still confused as where this new range is located (i.e what is its fully qualified name?)

    My specific confusion is highlighted by the comment after the StrTxt assignment in the code below.

        With wdDoc.Range 'Range#1
          With .Find
            .ClearFormatting
            .Text = "P[0-9]{5,6}"
            .Forward = True
            .Wrap = wdFindContinue
            .Format = False
            .MatchWildcards = True
            .Execute
          End With
          If .Find.Found = True Then
            StrTxt = .Duplicate.Text  'What is the fully qualified expression for this .Duplicate.Text ? Based on my watches list, it appears this is not wdDoc.Range.Duplicate.Text as I would expect from the "With wdDoc.Range" statement from above.

  10. #30
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Quote Originally Posted by topher217 View Post
    Is there a way to Watch the fully qualified expression for the range derived from wdDoc.Range.Find.Execute? (wdDoc.Range.Find.Execute only evaluates to a Boolean) .
    I don't know and it's not something I've ever found a need to bother with. Sure, wdDoc.Range.Find.Execute only evaluates to a Boolean, but that's not the point. The point is that, if it evaluates to True, the current range changes to the found range.
    Quote Originally Posted by topher217 View Post
    <----- "the range" refers to which range?
    I'd have thought it pretty obvious that, since the range the .Find is working on is wdDoc.Range, that's the range that I was referring to.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  11. #31
    Quote Originally Posted by macropod View Post
    I don't know and it's not something I've ever found a need to bother with. Sure, wdDoc.Range.Find.Execute only evaluates to a Boolean, but that's not the point. The point is that, if it evaluates to True, the current range changes to the found range.

    I'd have thought it pretty obvious that, since the range the .Find is working on is wdDoc.Range, that's the range that I was referring to.
    Yes, I thought that to be the case, but it doesn't match with what I am observing, so I was questioning everything that I could be misunderstanding.

    Ok, so maybe a summary of the overall problem here will resolve things.

    Before reading your sample code, my overall question was "How do you assign the .Find.Execute range to some variable that I can parse?" In your code you do this by running the .Find.Execute method and then assigning .Duplicate.Text to a string variable called StrTxt. This method works, and does what I am hoping for, but I am unable to understand WHY it works. You claim that the wdDoc.Range starts as the whole document range, but after the .Find.Execute method runs, this range changes to the found range only right? In this logic, I would think that the fully qualified statement assigning the StrTxt to read "StrTxt = wdDoc.Range.Duplicate.Text" but this doesn't match what I observe while stepping through the code with various Watches.

    In order to check if what you claim is actually happening or not I run through the following steps (See attached image showing this process as well):

    1.) I set watches on StrTxt, wdDoc.Range, wdDoc.Range.Duplicate.Text, wdDoc.Range.Find.Execute, and wdDoc.Range.Find.Found. ... I set a breakpoint and run until the .Find.Execute line of code (I observe all of my watches)
    2.) I step into the next lines of code and find that none of my watches change at all. (Based on your explanation, I would think they should change at this point)
    3.) I notice that before I enter the If statement, the wdDoc.Range.Find.Found evaluates to FALSE in the watches window. Despite this fact, the code enters the If statement as if it were true. (This is one hint that .Find.Found is NOT the same as wdDoc.Range.Find.Found even though this is what the With statement structure would imply.)
    4.) I notice that wdDoc.Range.Duplicate.Text is still the entire document text, and step through the line of code assigning .Duplicate.Text to StrTxt. Now I see that StrTxt is assigned the properly found text of P1003 even though wdDoc.Range.Duplicate.Text still evaluates to the full document text. (Another hint that the With structure is not behaving as I thought).

    Attachment 18425

    Based on the above observations, the assignment to StrTxt cannot be based on wdDoc.Range.Duplicate.Text as we have concluded up to this point. In order to work with this process more in the future I need to understand how this assignment works. What you say matches what I can find in the msdn documentation.

    So even though the end results appear to be working as you say, the Watches window in this case seems to be useless or wrong? Seems like a nightmare to debug such code if you can't follow the variables you are using. Even if you don't know the answer to how this works, I appreciate your sample code since I never would have came to the conclusion of using the same range like you did, so Thank You!

  12. #32
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Quote Originally Posted by topher217 View Post
    You claim that the wdDoc.Range starts as the whole document range, but after the .Find.Execute method runs, this range changes to the found range only right? In this logic, I would think that the fully qualified statement assigning the StrTxt to read "StrTxt = wdDoc.Range.Duplicate.Text" but this doesn't match what I observe while stepping through the code with various Watches.
    Well, if you point StrTxt to wdDoc.Range.Duplicate.Text (or wdDoc.Range.Text), no amount of subsequent wdDoc.Range.Find.Execute usage is going to change what StrTxt contains. StrTxt will only ever reference the found string if you populate it after using wdDoc.Range.Find.Execute.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  13. #33
    Yep, that is what I plan on doing. I appreciate the effort to figure this out, but I suppose it will remain a mystery. One possibility that I thought of, but don't know how to test is that when I watch wdDoc.Range.Duplicate.Text, my Watches window is restricted to the Context of with Module1, Sheet1, or ThisWorkbook (in the add/edit watch window). I'm guessing the assignment of .Duplicate.Text to StrTxt comes from some lower level Context defined by the .Find object (and not the local variable/parameter wdDoc.Range.Duplicate.Text). So whenever I try to watch wdDoc.Range.Duplicate.Text, I am only seeing the local definition of this rather than the lower level definition. Something like a private/global variable c.o.n.f.l.i.c.t perhaps?

    I'm still curious how you knew to use the assignment like that? I haven't found anything like that in the msdn documentation.

    Side note: Any idea why is the word c.o.n.f.l.i.c.t (minus all the periods) a forbidden word? My post was denied until I found it was caused by that single word. "Post denied. New posts are limited by number of URLs it may contain and checked if it doesn't contain forbidden words." Couldn't find anything in the FAQ about this.

  14. #34
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Quote Originally Posted by topher217 View Post
    I'm still curious how you knew to use the assignment like that? I haven't found anything like that in the msdn documentation.
    According to https://msdn.microsoft.com/en-us/lib...r=-2147217396:
    If you've gotten to the Find object from the Selection object, the selection is changed when text matching the find criteria is found.
    ...
    If you've gotten to the Find object from the Range object, the selection isn't changed when text matching the find criteria is found, but the Range object is redefined.
    See also:
    https://msdn.microsoft.com/en-us/lib...ffice.11).aspx

    As for:
    Quote Originally Posted by topher217 View Post
    Any idea why is the word c.o.n.f.l.i.c.t (minus all the periods) a forbidden word? My post was denied until I found it was caused by that single word. "Post denied. New posts are limited by number of URLs it may contain and checked if it doesn't contain forbidden words."
    I have no idea why a word like 'conflict' would cause problems.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  15. #35
    Hi all,

    I am trying to achieve a similar solution to this original request in that I have a document that is part table data and part text under a specific header. I have a large folder of these documents, all formatted the same, though some text may be longer than others. I am newer to VBA and am trying to figure out how to adjust the code that was originally provided to point the macro to specific headers (of table and text) in the word document to copy into excel. Below is an example of the document. Any help with writing the macro would be much appreciated.

    TAYARI LESSON OBSERVATION BRIEF

    County:
    Zone/Cluster:
    School:
    CSO:
    Teacher:
    Officer:
    Date Activity Week Day Lesson
    Duration
    Pupils
    Present
    Girls Boys Take-up Rating
    17/OCT/2016 Language Activities 25 1 50 34 18 16 Prepared
    Qualitative Background Information

    Barakeiwo is about 1 hour and half drive from Eldoret office. The area is particularly bad when it rains and we had to negotiate through the mud with a 4x4 office vehicle. They have 2 levels of ECDE, with PP1 being mixed with baby and PP2.
    WHAT WENT WELL WHAT DID NOT GO WELL FEEDBACK FROM CSO/DICECE FEEDBACK TO CSO/DICECE
    Day and date done. Learner of the day activity done and learners guided to identify letter sounds in the name. Teacher used some of vocabulary to share news with learners, however the news was too long. In letter recognition teacher guided learners in identifying sounds and reinforcing them with actions. Teacher did not give learners an opportunity to share news. Teacher needs to ensure that all learners are saying the sounds and doing the actions. Teacher to work on proper articulation of sounds. Steps in the teacher read aloud mixed up, pre reading activities being done after the reading activities. Teacher advised to minimize time wastage. Teacher to allow learners an opportunity to share news. Follow the steps in the teacher’s guide in teaching read aloud. Give formative feedback to teachers during the lesson. Encourage peer mentoring between the teacher and the PP2 teacher.
    Use papaya to guide teachers in the correct articulation of sounds.

    Overall Observation and Recommendations

    Teacher has a good grasp of tayari core strategies. Teacher to be encouraged to manage time well and minimize time wastage.

  16. #36
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    As was mentioned earlier in this thread, the code to extract data from tables is quite different. Furthermore, it's by no means apparent from your post what data you want to extract - aside from the fact some is in the table and some is not. I suggest you start a new thread setting out exactly what your requirements are, as well as attaching a document with some representative sample data.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  17. #37
    Hi Macropod,

    I do not have permissions to post new threads yet since I recently registered. My need is to pull all text for each header or section. The document is indeed a combination of free text and table text and so 1) I am curious if it is possible to write a macro that can combine both of these, and 2) how I would go about writing the macro to pull the text to the right or within a header location within the document. Any help would be much appreciated with the limited permissions I have at this point.

    Thanks,
    Kyle

  18. #38
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Quote Originally Posted by kayfreed89 View Post
    I do not have permissions to post new threads yet since I recently registered.
    That just isn't so. There would be no point in allowing people to register but not then start a thread describing a problem they need help with. As I said in my previous reply, you should start such a thread, setting out the full details of what you're trying to achieve - and include a sample document with some representative data.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  19. #39
    Hello Macropod,

    Recently i have seen the macro you have written for import text from word to excel. Based on your macro, i have copied more than 100 word docs in to sample file and run the marco (demo file).

    The macro is picking up only the first word sheet and rest not picked by macro.

    I cant attach the files for your reference. Can you please help me in getting the details for all word docs.


    Regards,
    Naveen
    Attached Files Attached Files

  20. #40
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    It's hardly surprising that a macro I wrote for an entirely different scenario doesn't work for yours. This kind of document parsing is very particular. In this case, the macro was only ever coded to extract one set of data per document, not multiple sets. You're lucky you got anything meaningful... The problem in this case is of your own making:
    i have copied more than 100 word docs in to sample file
    You should NOT have done that!
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •