PDA

View Full Version : [SOLVED:] Import text from word document into excel



mayseed
07-06-2012, 03:45 AM
Hi... Dear all

I am struggling with finding code to help me extract text from word documents into an excel sheet...


Basically: I have 300 word documents that contain the same data for different individuals.. I need to come up with a code that would access the word documents one at a time, search the word document for certain words (e.g. "NAME" or "DOB") then extract what lies next to that name back into a field in excel.

I really am lost.. and was hoping to get some quick pointers!!

any help much appreciated..
what goes around comes around

thanks

macropod
07-07-2012, 01:51 AM
Do you want to run this from Word or Excel (I'd suggest Excel)?

In the Word documents, are the required data in specified locations (eg formfields, particular cells in a table, etc), could there be other text, etc in the same paragraphs?

mayseed
07-07-2012, 04:23 AM
Thanks macropod for replying back..

Basically:
I have around 300 radiology reports on 300 different patients. The data in each report is pretty standard in all, but I inherited all these reports in word documents which were not formated into fields or delimited or had coma identifiers extra..
Just as an example:

Name: XXXXX
DOB: XXXXXX
Address: XXXXXX

LVEDD=XXXX
LVESS=KKKK

TECHNIQUE:
1. YYYYYY
2. HHHHHHh

So what I basically want to do, is try and parse/extract the data for the different patients and insert it in a database for later processing.

It will take me ages to input manually and I was hoping to come up with some way to automate the whole process...

Many thanks for all the help...
PS: am still a medical student, and am a little bit challeneged when it comes to automating processes... but all the help I get from u guyz much appreciated!!!

macropod
07-08-2012, 04:08 AM
Hi mayseed,

It would be helpful if you could post a sample document (no sensitive data) so that I could see the complete document structure.

mayseed
07-08-2012, 04:41 AM
Thanks for all your help!! Just knowing ur keen to help made my day.. I attached a sample report.. basically all reports are structured similarly. What I want to extract is the technique, different numbers.. the measurements are always preceded by the same word (i.e. ESD = xxx) and so on...

thanks once again!

macropod
07-08-2012, 05:54 PM
See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.

mayseed
07-09-2012, 12:26 AM
I am so grateful for u macropod.. i wish i can repay u for this.. this is brilliant.. so its working nicely... i am going to go through the code and see if i can adjust it..
when it comes to the normal ranges i dont really need them.. all i need is the actual number..

thats all..
ill keep u posted with progress...

U MADE MY DAY!! I want to cry!!
thanks!

mayseed
07-09-2012, 02:43 AM
Hi again macropod!

I tried dissecting the code and was trying to add some more fields but just couldnt figure out how to make it identify the different numbers in the document..

i only need to extract the numbers (can ignore the normal range), some numbers are preceeded by "=" others are preceded by "-".

Am not quite sure what to do :(

thanks for ur help
ur awesome

macropod
07-09-2012, 04:45 AM
What to do depends on what you want to achieve. The code in the workbook I provided is already equipped to handle data preceded by ':' and '='. You just need to supply the prefixes. If you look where the code processes these, I'm sure you'll quickly figure out how to add the test for '-'.

As you haven't answered the questions I asked in my last post, I can't really provide more specific advice.

mayseed
07-10-2012, 12:46 AM
Thank you macropod...

I tried to use the code to find the EDV, ESV values these use "=" symbol.. i did that by adding fields to the table containing the exact strings as spelled in the word document.. but it still wouldn't work..

As for the data in the multiple paragraphs:
for example.. the technique field: then the statements (4 in the sample file) need to be inserted into 4 different fields in the table.

Again when I add "technique" to the table and then run the code, I only end up with the statement that follows the word technique.. the 4 other statements which occur on different lines get missed.. Any ideas on how to fix that?

THANK U MACROPOD... I am almost there with getting the code to work.. all thanks to u...

macropod
07-10-2012, 03:15 AM
Hi mayseed,

Try the attached. Do note there is a limit to what can be achieved with what you're working with. If your documents differ significantly from what you've posted, much of the code I've written could be invalidated - and coding around the difficulties might not be practical/possible.

One thing to note is that I've assumed the doctor's name & title at the end (which I assume you don't want) always span two paragraphs. They get deleted during processing (the changes aren't saved) so that they don't get included with the extracted data.


As for the data in the multiple paragraphs:
for example.. the technique field: then the statements (4 in the sample file) need to be inserted into 4 different fields in the table.
I don't think it's practical to do that, as you'll never know how many columns might be involved. For now, the code just puts them all in the one cell.

mayseed
07-20-2012, 05:09 PM
THIS WORKS PERFECTLY.. I WANT TO CRY..

THANK YOU SO MUCH MACROPOD THIS IS AMAZING.:beerchug:

Ichor
06-05-2013, 01:37 AM
Hi, sorry to jump on the end of this thread, but it seems to be almost whatI am looking form. Is it possible to use this to extract information from thecell to the right of the found word (from a table in word)? Rather than thenext tab?

Any help would be appreciated

macropod
06-05-2013, 03:19 AM
Probably not without a major re-write. Working with tables requires a significantly different approach to the data matching.

Ichor
06-05-2013, 03:24 AM
Thank you macropod, shame
I have found a way of pulling a table from multiple worddocuments, I am still trying to find a way to pull multiple tables frommultiple word documents, but the format is almost useless.
I was hoping that the method above of searching for aspecific word (column heading) in the documents and pulling out associated datawould work better than what I have at the moment.
Is that possible?

macropod
06-05-2013, 05:28 AM
Yes, it's possible. The code in this thread has all you need for looping though a series of documents; other threads here (in the Word Forum, probably) are likely to have code for looping through and finding content in tables.

amgupta1981
07-17-2013, 05:09 AM
Thanks !!

iliauk
11-08-2013, 05:52 AM
Dear macropod, the macro you have posted is exactly something I am going for. However, the 'fields' in my document are not as clearly labelled and I haven't had much luck altering the code. Would you be kind enough to help me get just the first field sorted and I will give the rest a go myself?

10798

Would it be possible for it to just extract the bits in the rectangles (which aren't fields so seems quite difficult to extract from them)

macropod
11-08-2013, 04:38 PM
iliauk: This is now the third forum you've posted at concerning this topic. And at none of them have you had the courtesy of following the cross-posting etiquette, per: http://www.excelguru.ca/content.php?184, despite being reminded of this at the other two forums. I have no intention of discussing this topic with you here.

Topic cross-posted at:
http://www.msofficeforums.com/word/18553-how-extract-key-data-word.html
http://www.excelforum.com/word-programming-vba-macros/966774-extracting-certain-data-from-word-into-excel-spreadsheet.html#post3468375

dippan77
10-16-2014, 07:30 PM
See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.

Thanks Macropod. I have a need to read word document and posting this to be able to download the attached code with the hope that I can get information I need to update my macro.

cmrini
09-17-2015, 11:41 AM
Is there a way to import content from only certain 'styles' of data?

I have a document that utilizes a table of contents and then has 'Req#' under each section of the document. I would like to pull out the 'Req#' and the following content and organize them likewise into a spreadsheet.

I have looked over the code presented, but it does not increment through out the selected document.

Thanks

tanjunyen
10-28-2015, 09:11 AM
Thanks!

topher217
02-17-2017, 02:35 AM
See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.

Hello macropod. Thank you very much for the head start on moving things between Word and Excel within VBA. I'm fairly good with VBA in Excel but am new to interacting with Word. I can't figure out how one step of your code is working. Its on line 87 and states, "StrTxt = .Duplicate.Text" . When I set a breakpoint around this area and work through it while keeping a watch on both "StrTxt" and "wdDoc.Range.Duplicate.Text" they turn out to be very different strings both before and after the assignment to StrTxt. It appears "wdDoc.Range.Duplicate.Text" shows the unfiltered or unfound text (the whole document contents), but when it is assigned to StrTxt, it shows the properly found value. MAGIC! Am I misunderstanding some With statements you are using and not watching the proper variable or what?

All of the other sample code and documentation I've found shows how to find/replace some string but I haven't found anything that actually returns the found string until this. Why does line 87 work the way it does?

A follow up, that may be answered from the first, is why do you use the Duplicate property? What advantage does this have over using just the Range.Text? Both seem to work the magic described above.

macropod
02-17-2017, 04:09 AM
In Word, the .Duplicate property allows you to vary a copy of a range without disturbing the original range (in this case, the range found by the Find - not the whole document). Since nothing to do with how StrTxt is populated involves manipulating the found range, .Duplicate could probably have been omitted in this case.

topher217
02-17-2017, 05:04 AM
macropod,

Thanks for your reply. Understood on the .Duplicate part, but I'm still confounded with what appears to be misunderstanding the With statement structure...unless I'm completely off the mark. I've uploaded a screenshot of what I'm encountering. All With statements above the screenshot appear to be ended up to this point, so I would think the following would be true (but they appear not to be true according to the the Watches List).

1.) The wdDoc.Range.Find.Found = True since we are now within the If statement where that was the condition. (The watches list tells me this is false though)
2.) StrTxt = wdDoc.Range.Duplicate.Text Though you can also see in the watches list, that this is not the case

18404

Am I making a noob mistake and misunderstanding the With statement structure or what? Hoping for a poke in the right direction.

I've commented out quite a few things from the original code, as they are not applicable to what I'm trying to do, but I don't see any reason they would affect the "magic" going on here. But just in case, I've pasted it below.



Sub UpdateData()
Application.ScreenUpdating = False
Dim StrFolder As String, StrFile As String, StrFnd As String, StrTxt As String
Dim wdApp As Object, wdDoc As Object, bStrt As Boolean
Dim WkSht As Worksheet, LRow As Long, LCol As Long, i As Long
Const wdFindContinue As Long = 1, wdReplaceAll As Long = 2
StrFolder = "F:\Test\"
If StrFolder = "" Then Exit Sub
Set WkSht = ThisWorkbook.Sheets("Sheet1")
LRow = WkSht.Cells.SpecialCells(xlCellTypeLastCell).Row
LCol = WkSht.Cells.SpecialCells(xlCellTypeLastCell).Column
' Test whether Word is already running.
'On Error Resume Next
bStrt = False ' Flag to record if we start Word, so we can close it later.
Set wdApp = GetObject(, "Word.Application")
'Start Excel if it isn't running
If wdApp Is Nothing Then
Set wdApp = CreateObject("Word.Application")
If wdApp Is Nothing Then
MsgBox "Can't start Word.", vbExclamation
Exit Sub
End If
' Record that we've started Excel.
bStrt = True
End If
On Error GoTo 0
StrFile = Dir(StrFolder & "\*.doc", vbNormal)
While StrFile <> ""
LRow = LRow + 1
Set wdDoc = wdApp.Documents.Open(Filename:=StrFolder & "\" & StrFile, AddToRecentFiles:=False, Visible:=False, ReadOnly:=True)
'Do some pre-processing cleanup
With wdDoc.Content.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
'Replace all tabs with single spaces
'.Execute Replace:=wdReplaceAll
'.Text = "[^t]{1,}"
'.Replacement.Text = " "
'Replace all double spaces with single spaces
'.Execute Replace:=wdReplaceAll
'.Text = "[ ]{2,}"
'.Replacement.Text = " "
'Clear out spaces before/after paragraph breaks
'.Text = " [^13]{1,}"
'.Replacement.Text = "^p"
'.Execute Replace:=wdReplaceAll
'.Text = "[^13]{1,} "
'.Replacement.Text = "^p"
'.Execute Replace:=wdReplaceAll
'Limit paragraph breaks and manual line breaks to one 'real' paragraph per set
'.Text = "[^13^11]{1,}"
'.Replacement.Text = "^p"
'.Execute Replace:=wdReplaceAll
'Insert extra paragraph breaks before paragraphs. This is to facilitate data extraction
'.Text = "^13[!^13]{1,}"
'.Font.Bold = True
'.Replacement.Text = "^p^&"
'.Execute Replace:=wdReplaceAll
'.Text = ""
'.MatchWildcards = False
'.Execute Replace:=wdReplaceAll
End With
'Get the data for each defined Excel column
For i = 1 To LCol
'StrFnd = WkSht.Cells(1, i).Value
With wdDoc.Range
With .Find
.ClearFormatting
.Text = "P[0-9]{5,6}"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = True
.Execute
End With
If .Find.Found = True Then
'Parse the data
StrTxt = .Duplicate.Text
If InStr(StrTxt, ":") > 0 Then
StrTxt = Trim(Mid(StrTxt, InStr(StrTxt, ":") + 1, Len(StrTxt)))
ElseIf InStr(StrTxt, "=") > 0 Then
StrTxt = Trim(Mid(StrTxt, InStr(StrTxt, "=") + 1, Len(StrTxt)))
End If
'Update Excel
WkSht.Cells(LRow, i).Value = StrTxt
End If
End With
Next
wdDoc.Close SaveChanges:=False
StrFile = Dir()
Wend
If bStrt = True Then wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
Application.ScreenUpdating = True
End Sub

Function GetFolder() As String
Dim oFolder As Object
GetFolder = ""
Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
Set oFolder = Nothing
End Function

macropod
02-17-2017, 02:33 PM
What you need to understand is that .Find.Execute changes the range from the document as a whole to the .Find.Found range.

topher217
02-18-2017, 04:32 AM
Thank you for your continued support macropod.

The range you are talking about is wdDoc.Range right? That would make sense to me, but it still doesn't appear to show up like that in the Watches List, which is what confuses me. Take the same screenshot for example; I just stepped through the "StrTxt = .Duplicate.Text" line of code, so I would assume the Watches on both StrTxt and wdDoc.Range.Duplicate.Text would be the same right? But they are not. Is .Duplicate.Text referring to something other than wdDoc.Range.Duplicate.Text at this point? Or am I watching the wrong full-context variable/parameter?

macropod
02-18-2017, 05:10 AM
The range you are talking about is wdDoc.Range right?
I discussed two ranges - the first being what you're referring to as wdDoc.Range, the second being the range derived from wdDoc.Range.Find.Execute. It is that second range to which the (unnecessary) .Duplicate refers and from which StrTxt is populated.

topher217
02-18-2017, 05:51 AM
I discussed two ranges - the first being what you're referring to as wdDoc.Range, the second being the range derived from wdDoc.Range.Find.Execute. It is that second range to which the (unnecessary) .Duplicate refers and from which StrTxt is populated.

Is there a way to Watch the fully qualified expression for the range derived from wdDoc.Range.Find.Execute? (wdDoc.Range.Find.Execute only evaluates to a Boolean) .


What you need to understand is that .Find.Execute changes the range
<----- "the range" refers to which range?

According to my watches list wdDoc.Range.Text does not change from the document to found range as I understood from your previous explanation. Therefore I'm still confused as where this new range is located (i.e what is its fully qualified name?)

My specific confusion is highlighted by the comment after the StrTxt assignment in the code below.



With wdDoc.Range 'Range#1
With .Find
.ClearFormatting
.Text = "P[0-9]{5,6}"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = True
.Execute
End With
If .Find.Found = True Then
StrTxt = .Duplicate.Text 'What is the fully qualified expression for this .Duplicate.Text ? Based on my watches list, it appears this is not wdDoc.Range.Duplicate.Text as I would expect from the "With wdDoc.Range" statement from above.

macropod
02-18-2017, 06:19 AM
Is there a way to Watch the fully qualified expression for the range derived from wdDoc.Range.Find.Execute? (wdDoc.Range.Find.Execute only evaluates to a Boolean) .
I don't know and it's not something I've ever found a need to bother with. Sure, wdDoc.Range.Find.Execute only evaluates to a Boolean, but that's not the point. The point is that, if it evaluates to True, the current range changes to the found range.


<----- "the range" refers to which range?
I'd have thought it pretty obvious that, since the range the .Find is working on is wdDoc.Range, that's the range that I was referring to.

topher217
02-18-2017, 08:02 PM
I don't know and it's not something I've ever found a need to bother with. Sure, wdDoc.Range.Find.Execute only evaluates to a Boolean, but that's not the point. The point is that, if it evaluates to True, the current range changes to the found range.

I'd have thought it pretty obvious that, since the range the .Find is working on is wdDoc.Range, that's the range that I was referring to.

Yes, I thought that to be the case, but it doesn't match with what I am observing, so I was questioning everything that I could be misunderstanding.

Ok, so maybe a summary of the overall problem here will resolve things.

Before reading your sample code, my overall question was "How do you assign the .Find.Execute range to some variable that I can parse?" In your code you do this by running the .Find.Execute method and then assigning .Duplicate.Text to a string variable called StrTxt. This method works, and does what I am hoping for, but I am unable to understand WHY it works. You claim that the wdDoc.Range starts as the whole document range, but after the .Find.Execute method runs, this range changes to the found range only right? In this logic, I would think that the fully qualified statement assigning the StrTxt to read "StrTxt = wdDoc.Range.Duplicate.Text" but this doesn't match what I observe while stepping through the code with various Watches.

In order to check if what you claim is actually happening or not I run through the following steps (See attached image showing this process as well):

1.) I set watches on StrTxt, wdDoc.Range, wdDoc.Range.Duplicate.Text, wdDoc.Range.Find.Execute, and wdDoc.Range.Find.Found. ... I set a breakpoint and run until the .Find.Execute line of code (I observe all of my watches)
2.) I step into the next lines of code and find that none of my watches change at all. (Based on your explanation, I would think they should change at this point)
3.) I notice that before I enter the If statement, the wdDoc.Range.Find.Found evaluates to FALSE in the watches window. Despite this fact, the code enters the If statement as if it were true. (This is one hint that .Find.Found is NOT the same as wdDoc.Range.Find.Found even though this is what the With statement structure would imply.)
4.) I notice that wdDoc.Range.Duplicate.Text is still the entire document text, and step through the line of code assigning .Duplicate.Text to StrTxt. Now I see that StrTxt is assigned the properly found text of P1003 even though wdDoc.Range.Duplicate.Text still evaluates to the full document text. (Another hint that the With structure is not behaving as I thought).

18425

Based on the above observations, the assignment to StrTxt cannot be based on wdDoc.Range.Duplicate.Text as we have concluded up to this point. In order to work with this process more in the future I need to understand how this assignment works. What you say matches what I can find in the msdn documentation.

So even though the end results appear to be working as you say, the Watches window in this case seems to be useless or wrong? Seems like a nightmare to debug such code if you can't follow the variables you are using. Even if you don't know the answer to how this works, I appreciate your sample code since I never would have came to the conclusion of using the same range like you did, so Thank You!

macropod
02-19-2017, 02:18 AM
You claim that the wdDoc.Range starts as the whole document range, but after the .Find.Execute method runs, this range changes to the found range only right? In this logic, I would think that the fully qualified statement assigning the StrTxt to read "StrTxt = wdDoc.Range.Duplicate.Text" but this doesn't match what I observe while stepping through the code with various Watches.
Well, if you point StrTxt to wdDoc.Range.Duplicate.Text (or wdDoc.Range.Text), no amount of subsequent wdDoc.Range.Find.Execute usage is going to change what StrTxt contains. StrTxt will only ever reference the found string if you populate it after using wdDoc.Range.Find.Execute.

topher217
02-19-2017, 09:58 PM
Yep, that is what I plan on doing. I appreciate the effort to figure this out, but I suppose it will remain a mystery. One possibility that I thought of, but don't know how to test is that when I watch wdDoc.Range.Duplicate.Text, my Watches window is restricted to the Context of with Module1, Sheet1, or ThisWorkbook (in the add/edit watch window). I'm guessing the assignment of .Duplicate.Text to StrTxt comes from some lower level Context defined by the .Find object (and not the local variable/parameter wdDoc.Range.Duplicate.Text). So whenever I try to watch wdDoc.Range.Duplicate.Text, I am only seeing the local definition of this rather than the lower level definition. Something like a private/global variable c.o.n.f.l.i.c.t perhaps?

I'm still curious how you knew to use the assignment like that? I haven't found anything like that in the msdn documentation.

Side note: Any idea why is the word c.o.n.f.l.i.c.t (minus all the periods) a forbidden word? My post was denied until I found it was caused by that single word. "Post denied. New posts are limited by number of URLs it may contain and checked if it doesn't contain forbidden words." Couldn't find anything in the FAQ about this.

macropod
02-19-2017, 10:22 PM
I'm still curious how you knew to use the assignment like that? I haven't found anything like that in the msdn documentation.
According to https://msdn.microsoft.com/en-us/library/office/ff839118%28v=office.14%29.aspx?f=255&MSPPError=-2147217396:

If you've gotten to the Find object from the Selection object, the selection is changed when text matching the find criteria is found.
...
If you've gotten to the Find object from the Range object, the selection isn't changed when text matching the find criteria is found, but the Range object is redefined.
See also:
https://msdn.microsoft.com/en-us/library/office/aa211953(v=office.11).aspx

As for:

Any idea why is the word c.o.n.f.l.i.c.t (minus all the periods) a forbidden word? My post was denied until I found it was caused by that single word. "Post denied. New posts are limited by number of URLs it may contain and checked if it doesn't contain forbidden words."
I have no idea why a word like 'conflict' would cause problems.

kayfreed89
03-06-2017, 11:46 PM
Hi all,

I am trying to achieve a similar solution to this original request in that I have a document that is part table data and part text under a specific header. I have a large folder of these documents, all formatted the same, though some text may be longer than others. I am newer to VBA and am trying to figure out how to adjust the code that was originally provided to point the macro to specific headers (of table and text) in the word document to copy into excel. Below is an example of the document. Any help with writing the macro would be much appreciated.

TAYARI LESSON OBSERVATION BRIEF

County:
Zone/Cluster:
School:
CSO:
Teacher:
Officer:


Date
Activity
Week
Day
Lesson
Duration
Pupils
Present
Girls
Boys
Take-up Rating


17/OCT/2016
Language Activities
25
1
50
34
18
16
Prepared


Qualitative Background Information

Barakeiwo is about 1 hour and half drive from Eldoret office. The area is particularly bad when it rains and we had to negotiate through the mud with a 4x4 office vehicle. They have 2 levels of ECDE, with PP1 being mixed with baby and PP2.


WHAT WENT WELL
WHAT DID NOT GO WELL
FEEDBACK FROM CSO/DICECE
FEEDBACK TO CSO/DICECE


Day and date done. Learner of the day activity done and learners guided to identify letter sounds in the name. Teacher used some of vocabulary to share news with learners, however the news was too long. In letter recognition teacher guided learners in identifying sounds and reinforcing them with actions.
Teacher did not give learners an opportunity to share news. Teacher needs to ensure that all learners are saying the sounds and doing the actions. Teacher to work on proper articulation of sounds. Steps in the teacher read aloud mixed up, pre reading activities being done after the reading activities.
Teacher advised to minimize time wastage. Teacher to allow learners an opportunity to share news. Follow the steps in the teacher’s guide in teaching read aloud.
Give formative feedback to teachers during the lesson. Encourage peer mentoring between the teacher and the PP2 teacher.
Use papaya to guide teachers in the correct articulation of sounds.



Overall Observation and Recommendations

Teacher has a good grasp of tayari core strategies. Teacher to be encouraged to manage time well and minimize time wastage.

macropod
03-07-2017, 04:19 AM
As was mentioned earlier in this thread, the code to extract data from tables is quite different. Furthermore, it's by no means apparent from your post what data you want to extract - aside from the fact some is in the table and some is not. I suggest you start a new thread setting out exactly what your requirements are, as well as attaching a document with some representative sample data.

kayfreed89
03-07-2017, 04:45 AM
Hi Macropod,

I do not have permissions to post new threads yet since I recently registered. My need is to pull all text for each header or section. The document is indeed a combination of free text and table text and so 1) I am curious if it is possible to write a macro that can combine both of these, and 2) how I would go about writing the macro to pull the text to the right or within a header location within the document. Any help would be much appreciated with the limited permissions I have at this point.

Thanks,
Kyle

macropod
03-07-2017, 01:59 PM
I do not have permissions to post new threads yet since I recently registered.
That just isn't so. There would be no point in allowing people to register but not then start a thread describing a problem they need help with. As I said in my previous reply, you should start such a thread, setting out the full details of what you're trying to achieve - and include a sample document with some representative data.

callmenaveen
05-09-2017, 10:35 PM
Hello Macropod,

Recently i have seen the macro you have written for import text from word to excel. Based on your macro, i have copied more than 100 word docs in to sample file and run the marco (demo file).

The macro is picking up only the first word sheet and rest not picked by macro.

I cant attach the files for your reference. Can you please help me in getting the details for all word docs.


Regards,
Naveen

macropod
05-10-2017, 04:47 AM
It's hardly surprising that a macro I wrote for an entirely different scenario doesn't work for yours. This kind of document parsing is very particular. In this case, the macro was only ever coded to extract one set of data per document, not multiple sets. You're lucky you got anything meaningful... The problem in this case is of your own making:

i have copied more than 100 word docs in to sample file
You should NOT have done that!

callmenaveen
05-11-2017, 01:04 AM
Hello Paul,

Thanks for your reply. The macro is working abosolutely fine, instead on copying all resumes in one word document, i started saving each resume and kept the file name as samplereport 1, 2, 3 etc.

All the resumes data got imported to excel based on the format updated in demo file.

Many thanks for your macro.. it has saved me lot of time and work pressure.

Regards
Naveen Rao

shaz99
06-19-2017, 08:25 PM
:thumb

See attached workbook. It has a macro named 'UpdateData' that will populate the columns for which names exist. Simply point the macro's browser to the folder containing the files you want to process.

As I'm not sure which bits of the remaining data you're after, I haven't finished the code yet. If you could clarify what's supposed to happen where multiple paragraphs are involved, which other fields you want and what you want to do about the bracketted 'normal' ranges, more progress can be made.

macropod
06-27-2017, 04:55 PM
Thread closed. And further discussion on issues related to the discussion in this thread should be started in a new thread - referencing this one if appropriate.