PDA

View Full Version : Solved: Ugly Text File to Convert from Word to Excel



vaguirre
08-08-2005, 07:36 AM
Hi,

This is my first post. I'm using Office 2000. I've worked with other delimited (tab, quote, space) text files before but this is the ugliest one I've run across.

I was asked to convert pieces and parts from each record to Excel so that it can be sorted etc. I cleaned up as much of the file as I could using macros and global search and replaces to remove the headings, etc. That part was fairly simple, after I read thru some of the posts on this website. Thank you very much!

Now I've come to the experts for help with the next part because the number of lines for each record vary and I've tried so many things that I'm :bug:. Since the text file has 2,790 pages I'd rather not go through it manually.

I've attached a sample file. I've highlighted the different fields that I need to extract. Yellow=Acct. No., Green=Name/Address, Blue=Description, and Gray=Property Addr.

Thanks for any help you can give me!
Vicki

TonyJollans
08-08-2005, 05:03 PM
Hi Vicki,

Welcome to VBAX!

It's a bit rough and ready but I think this'll do what you ask. I've made a couple of assumptions about the format and tested it in 2003 (but it should work in 2000). It creates an Excel file with the same name (except it's .xls instead of .doc) and in the same place as the Word document. Let us know if it does the trick.

Sub BreakUp()

Dim appExcel As Object
Dim objWorkSheet As Object

Dim numRow As Long

Dim ndx As Long
Dim Para As Paragraph

Dim AccountNo As String
Dim NameAndAddress As String
Dim Description As String
Dim PropertyAddress As String

Dim FullLine As String

ndx = 1

Set appExcel = CreateObject("Excel.Application")
Set objWorkSheet = appExcel.workbooks.Add.Sheets(1)

Do

Set Para = ActiveDocument.Paragraphs(ndx)

If Para.Range.Text = "@@#@@" & vbCr Then

numRow = numRow + 1

If numRow = 1 Then
objWorkSheet.Cells(numRow, 1) = "Account No"
objWorkSheet.Cells(numRow, 2) = "Name and Address"
objWorkSheet.Cells(numRow, 3) = "Description"
objWorkSheet.Cells(numRow, 4) = "Property Address"
Else
objWorkSheet.Cells(numRow, 1) = AccountNo
objWorkSheet.Cells(numRow, 2) = NameAndAddress
objWorkSheet.Cells(numRow, 3) = Description
objWorkSheet.Cells(numRow, 4) = PropertyAddress
End If

AccountNo = ""
NameAndAddress = ""
Description = ""
PropertyAddress = ""

Else

FullLine = Left$(Replace(Para.Range.Text, vbCr, "") & Space(133), 133)
' pad line with spaces

If Mid$(FullLine, 29, 14) = "PROPERTY ADDR:" Then
PropertyAddress = Trim(Mid$(FullLine, 29, 60)) ' How Long is this?

Else

If Trim(Mid$(FullLine, 27, 32)) <> "" Then _
NameAndAddress = NameAndAddress & IIf(NameAndAddress = "", "", vbLf) & _
Trim(Mid$(FullLine, 27, 32))
If Trim(Mid$(FullLine, 59, 32)) <> "" Then _
Description = Description & IIf(Description = "", "", vbLf) & _
Trim(Mid$(FullLine, 59, 32))

If Trim(Mid$(FullLine, 2, 4)) = "001-" Then _
AccountNo = Trim(Mid$(FullLine, 11, 15))

End If
End If

ndx = ndx + 1

Loop While ndx <= ActiveDocument.Paragraphs.Count

With objWorkSheet.Columns("A:D")
.WrapText = False
.AutoFit
.WrapText = True
.AutoFit
.VerticalAlignment = -4160 ' xlTop
End With

objWorkSheet.Parent.SaveAs Replace(ActiveDocument.FullName, ".doc", ".xls")
objWorkSheet.Parent.Saved = True

appExcel.Quit

Set objWorkSheet = Nothing
Set appExcel = Nothing

End Sub

vaguirre
08-09-2005, 11:11 AM
Hi Tony,

Thank you for responding. I've been reviewing the code to make sure that I understand what it's doing...I know enough to be dangerous! I made one small tweak and then I'm going to run it against my file.

I think this code will do what I want except I forgot to say what sorts were needed and that a mail merge will be done later on. I'll do better next time. One of the lines for the Description field (Column 59) will have a legal description, such as S5 T13N R4E which I'd like to sort on.

Should I add code within your code to copy the legal description to another field called Legal Description as it's breaking things up? Or is there a way to do it in Excel once the information is there? I'm more familiar with Word.

If you could steer me in the right direction or give some hints, I'd like to try that on my own.

Thanks again,
Vicki

TonyJollans
08-09-2005, 01:11 PM
Hi Vicki,

I might know more than you, but it doesn't stop me being dangerous :D

I would split out the extra field while processing the Word Document and create an extra column in the Worksheet. I can't see anything like it in your sample document so don't know how you will identify it, or isolate it from the rest of the description, but I would imagine that you need to add an extra bit of code before or after the line that begins:

If Trim(Mid$(FullLine, 59, 32)) <> "" Then _

If you're happy to change the code yourself then, of course, go ahead. If you want any further help, or if you don't follow any of the code then do, please, post back.

vaguirre
08-09-2005, 02:56 PM
Hi Tony,

I split my document (2,790 pgs) into five docs. When I run the code (I haven't changed it for the Legal Description yet) against the last file of 389 pages it takes a long time to run. I let it run for 10 minutes before I checked task manager. At that point Word quit responding so I told it to end task. When I opened Excel my display looked weird. I closed Excel and it asked if I wanted to save Book1, I said yes. When I opened Book1 I compared it to my file and saw that only 100 pages were done. Do I need to change something at my end to speed this up?

Or should I use a similiar technique and do it in Word? I could then either open the file in Excel or import it in.

I will be here for another hour and then gone for the next two days.

Thanks again for your help,
Vicki

TonyJollans
08-09-2005, 04:29 PM
Just got back to my PC and I guess I've missed you now.

That sounds very slow but I will have a look at the performance of it (I didn't give it a thought when I wrote the code) and see if there's any reason or anything I can do.

TonyJollans
08-10-2005, 03:53 PM
Hi Vicki,

I've made a few changes and done some timings and this should be quite a bit faster - there may be a bit more can be squeezed out of it but not a lot I suspect. I created a 500-page document by repeating the sample lots of times and it took about a minute to run.

Sub BreakUp2()

Dim appExcel As Object
Dim objWorkSheet As Object

Dim numRow As Long

Dim Para As Paragraph

Dim AccountNo As String
Dim NameAndAddress As String
Dim Description As String
Dim PropertyAddress As String

Dim FullLine As String

Dim txtAddr As String
Dim txtDesc As String

Set appExcel = CreateObject("Excel.Application")
Set objWorkSheet = appExcel.workbooks.Add.Sheets(1)

For Each Para In ActiveDocument.Paragraphs

If Para.Range.Text = "@@#@@" & vbCr Then

numRow = numRow + 1

If numRow = 1 Then
objWorkSheet.Cells(numRow, 1) = "Account No"
objWorkSheet.Cells(numRow, 2) = "Name and Address"
objWorkSheet.Cells(numRow, 3) = "Description"
objWorkSheet.Cells(numRow, 4) = "Property Address"
Else
objWorkSheet.Cells(numRow, 1) = AccountNo
objWorkSheet.Cells(numRow, 2) = Mid$(NameAndAddress, 2)
objWorkSheet.Cells(numRow, 3) = Mid$(Description, 2)
objWorkSheet.Cells(numRow, 4) = PropertyAddress
End If

AccountNo = ""
NameAndAddress = ""
Description = ""
PropertyAddress = ""

Else

FullLine = Left$(Para.Range.Text, Len(Para.Range.Text) - 1) & Space(133)
' pad line with spaces

If Mid$(FullLine, 29, 14) = "PROPERTY ADDR:" Then
PropertyAddress = Trim(Mid$(FullLine, 29, 60)) ' How Long is this?

Else

txtAddr = Trim(Mid$(FullLine, 27, 32))
txtDesc = Trim(Mid$(FullLine, 59, 32))

If txtAddr <> "" Then NameAndAddress = NameAndAddress & vbLf & txtAddr
If txtDesc <> "" Then Description = Description & vbLf & txtDesc

If Mid$(FullLine, 2, 4) = "001-" Then AccountNo = Trim(Mid$(FullLine, 11, 15))

End If

End If

Next

With objWorkSheet.Columns("A:D")
.WrapText = False
.AutoFit
.WrapText = True
.AutoFit
.VerticalAlignment = -4160 ' xlTop
End With

objWorkSheet.Parent.SaveAs Replace(ActiveDocument.FullName, ".doc", ".xls")
objWorkSheet.Parent.Saved = True

appExcel.Quit

Set objWorkSheet = Nothing
Set appExcel = Nothing

End Sub

vaguirre
08-16-2005, 09:51 AM
Hi Tony,

Sorry for the delay in getting back to you. I've been in/out since Friday. I'll give this a whirl later today and let you know how it goes.

Thanks again for your help,
Vicki

MOS MASTER
08-16-2005, 10:29 AM
Hi Tony, :yes

I love the way you deal with those Word content processing questions! :whistle:

TonyJollans
08-16-2005, 10:48 AM
Hi Joost,

Thanks! I like doing them.

They're (usually) the sort of things that never ought to be needed. This one is processing a (file of a) printed report from an IBM mainframe (at least I think it is) and it would be a much easier job to write out a csv from the mainframe (even in Cobol but there are better tools) but that would (probably) involve a lot of paperwork and take forever to get scheduled.

It's almost always easier to take control and use the power on your desktop - if you know how to, and that's where VBAX comes in.

MOS MASTER
08-16-2005, 10:51 AM
Hi Joost,

Thanks! I like doing them.

They're (usually) the sort of things that never ought to be needed.

You're welcome and it shows! :yes

I take those questions once in a while cause they take more time and you have to leave other questions for it.

And I do agree proper design of the sollution your after saves you more time than Coding can fix!

Later Sir Tony. :whistle:

vaguirre
08-16-2005, 04:05 PM
Hi Tony,

You're great! I made one tweak, ran your new code thru the entire document and approx. 2 minutes later it was done.

The only way I could see to get the Legal Description was to do a global search/replace using wildcards in Word. For example, find=([S][0-9]{1,2} )([T][0-9]{1,2}[NS] ) and replace=LEGDESCR: \1\2. Now I'll add the extra column and code and see what happens. I'll let you know how it goes tomorrow.

Have a good one!
Vicki

vaguirre
08-17-2005, 07:44 AM
Hi Tony,

I tweaked the code for the Legal Description and it worked for the majority of the file! Here's the final code:


Sub BreakUp2()

Dim appExcel As Object
Dim objWorkSheet As Object

Dim numRow As Long

Dim Para As Paragraph

Dim AccountNo As String
Dim NameAndAddress As String
Dim Description As String
Dim LegalDescription As String
Dim PropertyAddress As String

Dim FullLine As String

Dim txtAddr As String
Dim txtDesc As String

Set appExcel = CreateObject("Excel.Application")
Set objWorkSheet = appExcel.workbooks.Add.Sheets(1)

For Each Para In ActiveDocument.Paragraphs

If Para.Range.Text = "@@#@@" & vbCr Then

numRow = numRow + 1

If numRow = 1 Then
objWorkSheet.Cells(numRow, 1) = "Account No"
objWorkSheet.Cells(numRow, 2) = "Name and Address"
objWorkSheet.Cells(numRow, 3) = "Description"
objWorkSheet.Cells(numRow, 4) = "Legal Description"
objWorkSheet.Cells(numRow, 5) = "Property Address"
Else
objWorkSheet.Cells(numRow, 1) = AccountNo
objWorkSheet.Cells(numRow, 2) = Mid$(NameAndAddress, 2)
objWorkSheet.Cells(numRow, 3) = Mid$(Description, 2)
objWorkSheet.Cells(numRow, 4) = LegalDescription
objWorkSheet.Cells(numRow, 5) = PropertyAddress
End If

AccountNo = ""
NameAndAddress = ""
Description = ""
LegalDescription = ""
PropertyAddress = ""

Else

FullLine = Left$(Para.Range.Text, Len(Para.Range.Text) - 1) & Space(133)
' pad line with spaces

If Mid$(FullLine, 29, 14) = "PROPERTY ADDR:" Then
PropertyAddress = Trim(Mid$(FullLine, 48, 42)) ' How Long is this?

Else

txtAddr = Trim(Mid$(FullLine, 27, 32))
txtDesc = Trim(Mid$(FullLine, 59, 32))

If txtAddr <> "" Then NameAndAddress = NameAndAddress & vbLf & txtAddr
If txtDesc <> "" Then Description = Description & vbLf & txtDesc

If Mid$(FullLine, 59, 10) = "LEGDESCR: " Then LegalDescription = Trim(Mid$(FullLine, 68, 14))
If Mid$(FullLine, 2, 1) = "0" Then AccountNo = Trim(Mid$(FullLine, 11, 15))

End If

End If

Next

With objWorkSheet.Columns("A:E")
.WrapText = False
.AutoFit
.WrapText = True
.AutoFit
.VerticalAlignment = -4160 ' xlTop
End With

objWorkSheet.Parent.SaveAs Replace(ActiveDocument.FullName, ".doc", ".xls")
objWorkSheet.Parent.Saved = True

appExcel.Quit

Set objWorkSheet = Nothing
Set appExcel = Nothing

End Sub


Thank you very much. :clap:
Now I'm more dangerous then ever!
Vicki

TonyJollans
08-17-2005, 08:11 AM
Thanks for posting back, Vicki.

Happy to have helped.

For your info, the biggest performance gain was made by replacing ..

Do
Set Para = ActiveDocument.Paragraphs(ndx)
:
:
ndx = ndx + 1
Loop

with ...

For Each Para In ActiveDocument.Paragraphs
:
:
Next

I wrote it the original way for a reason - I forget exactly why now but I was going to increment the paragraph index a variable number of times inside the loop, but then didn't end up wanting or needing to. The original code required Word to count through its paragraphs every time which took longer and longer the further through the document you got - in the amended version Word keeps track of where it is making it, as you saw, much quicker.

Stay dangerous!