PDA

View Full Version : Solved: Code testing appreciated



mvidas
02-13-2006, 09:33 AM
Hi Everyone,

This code is by no means finished (just try and follow it!), but I'm curious to see how this works for other people.

The only thing you need for this? A fixed-width text file. This was originally created for someone who had a somewhat-changing fixed width file they wanted to automate for import. I just want people to try it, and tell me if/why it would not work for them. I understand it will not work for files that look like:
AAAAABBCCCCCCDDDDEEEFFFFFFGGGGHHHIIII
(no spaces possible between entries)
But I'm still interested in seeing how it works for others. In my testing it almost always worked, but I was testing it with specific-format files.

Just copy/paste the following into a standard module, and run the ImportFixedWidthTest sub. It will ask you for the filename to import:Sub ImportFixedWidthTest()
Dim FileCont() As String, Cnt As Long, vFF As Long, vFile As String, tStr As String
Dim tempArr() As Long, numItems() As Long, i As Long, itmCnt As Long, itmCount As Long
Dim RegEx As Object, RegC As Object, MaxFields As Long, MaxFields2 As Long
Dim FieldCnt As Long, FieldWidths() As Long, AllFields(), LastFld As Long
Dim AllFieldWidths() As Long, j As Long, StartPos() As Long
Set RegEx = CreateObject("vbscript.regexp")
RegEx.Global = True
vFile = Application.GetOpenFilename("Text Files,*.txt,AllFiles,*.*")
If LCase(vFile) = "false" Then Exit Sub
vFF = FreeFile
Open vFile For Input As #vFF
Do Until EOF(vFF)
Line Input #vFF, tStr
ReDim Preserve FileCont(Cnt)
FileCont(Cnt) = tStr
Cnt = Cnt + 1
Loop
Close #7
ReDim AllFields(Cnt - 1)
RegEx.Pattern = "\S\s|\S$"
itmCnt = 0
ReDim numItems(1, itmCnt)
For Cnt = 0 To UBound(FileCont)
Set RegC = RegEx.Execute(FileCont(Cnt))
itmCount = RegC.Count
ReDim tempArr(itmCount - 1)
LastFld = 0
For i = 0 To itmCount - 1
tempArr(i) = RegC(i).FirstIndex + 1 - LastFld
LastFld = RegC(i).FirstIndex + 1
Next
AllFields(Cnt) = tempArr
For i = 0 To itmCnt - 1
If numItems(0, i) = itmCount Then Exit For
Next
If i = itmCnt Then
ReDim Preserve numItems(1, itmCnt)
numItems(0, itmCnt) = itmCount
numItems(1, itmCnt) = 1
itmCnt = itmCnt + 1
Else
numItems(1, i) = numItems(1, i) + 1
End If
Next
MaxFields = 0
MaxFields2 = -1
For i = 0 To itmCnt - 1
If numItems(1, i) > MaxFields Then
MaxFields = numItems(1, i)
MaxFields2 = 0
FieldCnt = numItems(0, i)
ElseIf numItems(1, i) = MaxFields Then
MaxFields2 = MaxFields
End If
Next
If MaxFields2 <> 0 Then
MsgBox "Unable to determine number of fields."
Exit Sub
End If
ReDim AllFieldWidths(FieldCnt - 1, 0)
Cnt = 0
For j = 0 To UBound(AllFields)
If UBound(AllFields(j)) = FieldCnt - 1 Then
ReDim Preserve AllFieldWidths(FieldCnt - 1, Cnt)
For i = 0 To FieldCnt - 1
AllFieldWidths(i, Cnt) = AllFields(j)(i)
Next
Cnt = Cnt + 1
End If
Next
ReDim FieldWidths(FieldCnt - 1)
ReDim StartPos(FieldCnt - 1)
StartPos(0) = 1
For i = 0 To FieldCnt - 1
ReDim tempArr(UBound(FileCont))
For j = 0 To Cnt - 1
tempArr(j) = AllFieldWidths(i, j)
Next
FieldWidths(i) = lMode(tempArr)
If FieldWidths(i) = 0 Then
MsgBox "Unable to determine field widths"
Exit Sub
End If
If i > 0 Then
StartPos(i) = StartPos(i - 1) + FieldWidths(i - 1)
End If
Next
Application.ScreenUpdating = False
i = Application.SheetsInNewWorkbook
Application.SheetsInNewWorkbook = 1
Workbooks.Add
Application.SheetsInNewWorkbook = i
For i = 0 To UBound(FileCont)
For j = 0 To FieldCnt - 1
Cells(i + 1, j + 1) = Mid(FileCont(i), StartPos(j), IIf(j = FieldCnt - 1, _
Len(FileCont(i)), FieldWidths(j)))
Next
Next
Application.ScreenUpdating = True
Set RegC = Nothing
Set RegEx = Nothing
End Sub
Function lMode(vValues() As Long)
Dim MaxCt As Long, MaxCt2 As Long, tMode As Long, ctValues() As Long, iCnt As Long
Dim i As Long, j As Long
iCnt = 0
ReDim ctValues(1, iCnt)
For i = 0 To UBound(vValues)
For j = 0 To iCnt - 1
If ctValues(0, j) = vValues(i) Then Exit For
Next
If j = iCnt Then
ReDim Preserve ctValues(1, iCnt)
ctValues(0, iCnt) = vValues(i)
ctValues(1, iCnt) = 1
iCnt = iCnt + 1
Else
ctValues(1, j) = ctValues(1, j) + 1
End If
Next
MaxCt = 0
MaxCt2 = -1
tMode = 0
For i = 0 To iCnt - 1
If ctValues(1, i) > MaxCt Then
MaxCt = ctValues(1, i)
MaxCt2 = 0
tMode = ctValues(0, i)
ElseIf ctValues(1, i) = MaxCt Then
MaxCt2 = MaxCt
End If
Next
If MaxCt2 <> 0 Then tMode = 0
lMode = tMode
End FunctionAgain, any feedback is much appreciated
Matt

Zack Barresse
02-13-2006, 11:57 AM
Hi Matt, I'll test when I can. Do you have a sample text file to use here? Or do you want us to create our own?

mvidas
02-13-2006, 12:44 PM
I can test it on my own files :) I'd put a sample file up, but I know it would work with it. I will though, if it could help.
Sample attached

Ken Puls
02-13-2006, 04:02 PM
Okay, now this is too funny... I browse to this post, and look at the google ad that pops up! My first thought: Dave Brett has started a new site???

:rotlaugh:

mvidas
02-14-2006, 05:57 AM
:) I think Dave's site would be "RegexPosse", as he was given the nickname of the "leader of the regexp posse" by matthewspatrick.

So Ken, howd the client end up liking it?

Ken Puls
02-14-2006, 08:05 AM
So Ken, howd the client end up liking it?

Need to follow up again, but primary indications are good. :thumb

Steiner
02-15-2006, 02:33 AM
Hi Matt,

quite cool (well at least as far as I can understand it :bug:)

Only one thing did not work for me right away (I guess it has something to do with my Excel / VBScript version):

tempArr(i) = RegC(i).FirstIndex + 1 - LastFld
did not work, because VBA would not let me access a match this way, instead I had to use:
tempArr(i) = RegC.Item(i).FirstIndex + 1 - LastFld

Same for the next line, but then it worked. And quite fast too: it took 10 seconds to import 1 meg, I've seen that much slower.
Even multiple empty columns in a row were identified correctly.

Daniel

mvidas
02-15-2006, 06:23 AM
Thanks Daniel,

I suppose I should clean it up a bit too, 21 variables seems a bit much (I dont even remember if I use all of them anymore!). What it does is find the ending point of each string in each line, determines from that how many fields the file must have based on most common field count (ie if 250 lines have 8 fields, 100 lines have 7 fields, and 30 lines have 9 fields, it assumes the 8 field count is correct). Then from that it determines which setup of 8 fields is right based on the most common string ending points, and then parses the file based on that.

I suppose it is only good for certain file types, I should probably also check to see when it changes from character to digit and vice versa, as that would probably be a end-of-field too.

I had just spent a while on Friday working on this, more or less just curious to see how it worked on other peoples' fixed width files :) I dont think there is enough demand for something like this, especially when it only works some of the time.

I'm gonna mark this as solved, but if you (or anyone) can think of any improvements or test it on more files, I'd love to hear it!

Matt

Ken Puls
02-15-2006, 09:23 AM
... 21 variables seems a bit much (I dont even remember if I use all of them anymore!).

Matt, do you have MZ Tools installed? If you do, go to the MZ toolbar (or menu item), choose Other Utilities|Review Source Code. It picks up all unused variables and procedures in the (actually all) projects. I run this on all my code before I release it, just to clean up. :thumb

mvidas
02-15-2006, 09:46 AM
I don't have it (after looking for it I found it at http://www.mztools.com/v3/download.htm - you should add this to the Cool Tools section of the portal), though I can't use it unfortunately due to non-admin rights on the computer and the non-VBA job I hold :( I'll install it at home though and check it out!

Ken Puls
02-15-2006, 09:58 AM
Hey Matt,

It's on the Resources page, but you're right. Should probably be on the Cool Tools page. :)

Sorry to hear that you can't install it at work. I hate trying to code without it now... the VBE just looks wrong without that toolbar! LOL!