PDA

View Full Version : Parsing CSV files where data is split over lines



colinb
11-06-2007, 01:06 PM
Hi

Am having trouble using VBA to load data into Excel from a CSV file, where the data in the file is mal-formed, thus:

A,B,C,D,E,F 'line 1
A,B 'start of line 2
C '....
D,E,F 'end of line 2
A,B,C,D,E,F 'line 3
So far, I have written a loop for 6 items on a line that reads A..F okay (i.e. line1, line 3) but the loop does not detect when the data is split over several lines.

The code below:

Dim readStr As String

For item = 1 to 6 step 1
Input #1, readStr '1 is open file
Next item
gets to the end of the line (i.e. A,B) and then gives me 4 nulls or empty strings (i.e. A,B,"","","","").

Have tried to test

if readStr = "" or readStr = chr(13) or readStr = chr(10) then
end if
to detect an end-of-line but none of these tests succeed. It's frustrating because I can make an equivalent test succeed using C and Ada but not, as yet, in VBA.

Any help would be gratefully received

Best wishes

Colin the Browncoat!

unmarkedhelicopter
11-06-2007, 02:23 PM
Seems you have a firefly in the ointment.
What you can do is read the text file in line by line.
Do you always have 6 fields per line (if the file were not mal-formed) ?
If so, then when you read in a line, test it and it has less then add the next line, until you have six, this 'may' lead to problems though if the 6th field is supposed to be blank.

colinb
11-06-2007, 03:59 PM
Hi, thanks for the speedy reply.

As you guessed, the problem is actually a bit more complex than the example might suggest. I have 144 comma-separated fields of various types (string, integer, boolean etc) per line, the last field of which may be a blank string. The data is mal-formed because the files are exported from a GUI-based tool which allows people to put carriage-returns into text fields to 'prettify' their output. The fei-oo CSV export function in the gorram tool puts a new line into the export file whenever it encounters a carriage-return and just carries on thereafter. Consequently, I can have item C spread over 4 or five lines (if you remember the original post) and D and E as well....

Up to now I have been editing the files manually in WordPad (or TextEdit depending on the platform) but I have just been given some monster files of several thousand data lines (tens of thousands of lines of export) and this is just not a reasonable proposition to be doing too often.

Seems to me that your suggestion of re-constructing the data using Line Input #, InStr() and Mid() is the way to go. It's what I'd with strtok and fgets in C, for example. I'll have a go and let you know how it goes.

But I'm surprised that I can't detect an end-of-line character from checking against what Input # produces though. Is there a code snippet that you're aware of somewhere (anywhere) that does this?

Best wishes

Colin the Browncoat!

----- "My other vehicle is a mid-bulk transport: 03-K64, class: Firefly"

unmarkedhelicopter
11-07-2007, 02:23 AM
But I'm surprised that I can't detect an end-of-line character from checking against what Input # produces though. Is there a code snippet that you're aware of somewhere (anywhere) that does this ?Er ! I'm not sure what you mean (I'm having a bad week understanding people) from what input # ? :whistle:

As an alternative (depending on structure) you could read the whole file into a string, remove ALL the CR's and LF's then start at the begining of the file putting them in fields, when you get to 142 (or so) start looking at the datatypes, hopefully the ones at the end are distinct from those at the begining and thus you should be able to spot the end of line.
Reset the counter and start on the next line etc. etc. :think:

Paul_Hossler
11-07-2007, 07:15 AM
Use


Line Input #1, ReadStr


to bring the entire line in up to the CR, and Split(ReadStr, ",") to pull each line apart

Example:


Option Explicit
Sub Demo()
Dim MyString As String
Dim vPieces As Variant
Dim i As Long
Open "C:\TEST.TXT" For Input As #1

Do While Not EOF(1)
Line Input #1, MyString

vPieces = Split(MyString, ",")

For i = LBound(vPieces) To UBound(vPieces)
Debug.Print i, vPieces(i)
Next i

Loop
Close #1
End Sub



You have to test for trailing commas (makes an empty vPieces), and join the pieces to make a single string if necessary, or if your data has the same number of fields in each logical record, you can just plug in N vPieces, go to the next record, do N more, etc.


Paul

figment
11-07-2007, 07:19 AM
But I'm surprised that I can't detect an end-of-line character from checking against what Input # produces though. Is there a code snippet that you're aware of somewhere (anywhere) that does this?

i am not sure if you can check strait from the input comand but VBA uses the constant vbCrLf to represent its end of line charactor. its also represented by a combination of two character desigantions, which i can't recall right now. But if you know how to write the code you want in C why not use that to fix the CSV file befor trying to bring it into excel.

colinb
11-07-2007, 11:36 AM
Use


Line Input #1, ReadStr


to bring the entire line in up to the CR, and Split(ReadStr, ",") to pull each line apart

Example:


Option Explicit
Sub Demo()
Dim MyString As String
Dim vPieces As Variant
Dim i As Long
Open "C:\TEST.TXT" For Input As #1

Do While Not EOF(1)
Line Input #1, MyString

vPieces = Split(MyString, ",")

For i = LBound(vPieces) To UBound(vPieces)
Debug.Print i, vPieces(i)
Next i

Loop
Close #1
End Sub



You have to test for trailing commas (makes an empty vPieces), and join the pieces to make a single string if necessary, or if your data has the same number of fields in each logical record, you can just plug in N vPieces, go to the next record, do N more, etc.


Paul
Hi, thanks for the reply. I had a go with something like this code snippet (from Ron's Excel tips?) and realised that the text strings in some of my data fields have commas embedded in them. You can guess what happened: the code divided up strings when it shouldn't have. But at least it ignored the end-of-line characters and got me closer to where I want to go.

Best wishes,

Colin the Browncoat!

colinb
11-07-2007, 12:14 PM
i am not sure if you can check strait from the input comand but VBA uses the constant vbCrLf to represent its end of line charactor. its also represented by a combination of two character desigantions, which i can't recall right now. But if you know how to write the code you want in C why not use that to fix the CSV file befor trying to bring it into excel.
Hi, thanks for the reply. I tried your suggestion of testing against
vbCrLf as well as Chr(10) and Chr(13) either together or separately. But to no avail.

You're right: I could write a tool in C, but I'd rather not. Reason I want to accomplish this in VBA-Excel is because I don't want my users to have to export the data out of one tool, use another tool to pre-process the data, before they can start looking at the data in Excel with my VBA routines. I would much rather do the integrity checking as part of the import routines and make life easier for my users.

Actually, I would rather get the tool vendor to fix their gorram broken CSV export routine, but that's never going to happen....

Best wishes

Colin the Browncoat!

unmarkedhelicopter
11-07-2007, 03:23 PM
... realised that the text strings in some of my data fields have commas embedded in them ... Well I guess you are well and truely b%&**?d then.

As a matter of interest what did you do to solve it in C, maybe we can work something back from that.

figment
11-07-2007, 03:29 PM
is it posible to call a C function/program from VBA?

unmarkedhelicopter
11-08-2007, 02:02 AM
is it posible to call a C function/program from VBA?Yes, just use a shell command.

colinb
11-15-2007, 01:21 PM
Hi Everyone

:hi:

Thanks to everyone who contributed to this thread. I 'solved' the problem by looking at different export options from the offending tool and found one that was amenable to being read by VBA's line input # mechanism. So I just abandoned the CSV route and have successfully implemented the desired import routine in VBA.

Hurrah!

For info, my 'C' programme scanned for CR-LF at the end of lines of data and checked for 'open' sets of inverted commas (delimiting a string). It ignored any commas in the string until it found a 'close' set of inverted commas on the same or any subsequent line of data. Worked a treat but required an external command which wasn't very elegant (to my way of thinking).

Anyway, thanks once again to everyone. You're all stars!

Best wishes

Colin the Browncoat

--- "Can I make a suggestion that doesn't involve violence, or is this the wrong crowd for that? "