Log in

View Full Version : extracting specific strings from a word document.



tefuzz
09-13-2017, 08:42 AM
Hey all, I've been away from programming for many years, and offered to help my job with something; I have a basic understanding of how to do what I need to do but as I said, it's been many years since I've coded anything.

What we have is a word document which will contain [potentially] thousands of lines of data. What I need to do is open the document using a macro (got that part down already) search for a particular string; in this case "GS1", and extract the string contained within the ( ) on each line where "GS1" is present and load it into excel, each string on it's own row in a column.

Here's an example of what the lines of data will look like:
(I've modified the actual data a little to remove some identifying info but the concept remains the same)



1) Scanned: 09-11-2017 15:19:35 Barcodes: 2 AT: 01 (Mode: AAU-AMS)
1) GS1 : (420284019261299989841400107372)
1) USS-128: (AX97493639589)
1) Route: (TRP)
> TRP REPLY: 235 Msec Msg: (0,N;2,C37;3,NOIBI;4,28401;6,28401706744;7,)

2) Scanned: 09-11-2017 15:20:06 Barcodes: 2 AT: 01 (Mode: AAU-AMS)
2) GS1 : ( )
2) USS-128: (A00R4217957212)
2) Route: (TRP)
> TRP REPLY: 116 Msec Msg: (0,N;2,C26;3,NOIBI;4,28401;6,28401445507;7,)

3) Scanned: 09-11-2017 15:20:35 Barcodes: 2 AT: 01 (Mode: AAU-AMS)
3) GS1 : (42028401926129999361001089635275)
3) USS-128: (4Y18210025864)
3) Route: (TRP)
> TRP REPLY: 60 Msec Msg: (0,N;2,C24;3,NOIBI;4,28401;6,28401690620;7,)


So for the example above, I would need my macro to extract

420284019261299989841400107372
42028401926129999361001089635275

and place them into the column, on separate rows. Not every number will be the same length, and as seen in the example above there will be some that have no numbers between the ( ) at all.

Not looking for someone to code the entire thing for me as I want to understand the mechanics and make it work on my own. Looking for some guidance and tips to help me along the way.

Thanks in advance.

snb
09-13-2017, 08:49 AM
Sub M_snb()
with getobject("G:\OF\example.docx")
sn=split(replace(join(filter(filter(split(.content,")"),"GS1"),"( ",0),"|")," GS1 : (","'"),"|")
.close 0
end with

thisworkbook.sheets(1).cells(1).resize(ubound(sn)+1)=application.transpose( sn)
end sub

tefuzz
09-13-2017, 09:32 AM
Wow, fast response, when I add the code to my button, the output comes out as such into column A, even though the column has been formatted as plain text. Also, it's adding a sequence to it as well. I need strictly the text string that was extracted.



1


4.20284E+33


3


4.20284E+33


4


4.20284E+33


5


4.20284E+33


6


4.20284E+33


7


4.20284E+33


8


4.20284E+33


9

snb
09-13-2017, 09:39 AM
I amended the code in #2 once. (exactly to write the numbers as Text). Did you use the latest version ?

tefuzz
09-13-2017, 10:28 AM
Perfect. Wow, thank you so much. I would have had wayyyy more lines of code to do that.

snb
09-13-2017, 11:45 AM
I think most of your fellow coders would have had....

FYI:


Sub M_snb()
With getobject("G:\OF\example.docx")
c00=.content
.close 0
End With

sn=split(c00,")")
sn=filter(sn,"GS1")
sn=filter(sn,"( ",false)
c00=join(sn,"|")
c00=replace(c00, GS1 : (","'")
sn=split(c00,"|")

thisworkbook.sheets(1).cells(1).resize(ubound(sn)+1)=application.transpose( sn)
End Sub

tefuzz
09-13-2017, 12:05 PM
quick question...If i were to need a header in Row 1, how would I adjust that code to make it start filling at row 2, instead of row 1?

tefuzz
09-13-2017, 12:08 PM
quick question...If i were to need a header in Row 1, how would I adjust that code to make it start filling at row 2, instead of row 1?

got it.

snb
09-13-2017, 12:46 PM
Sub M_snb()
With getobject("G:\OF\example.docx")
sn=split(replace(join(filter(filter(split(.content,")"),"GS1"),"( ",0),"|")," GS1 : (","'"),"|")
.close 0
End With

thisworkbook.sheets(1).cells(2,1).resize(ubound(sn)+1)=application.transpos e(sn)
End Sub