PDA

View Full Version : How to quickly count number of lines in a CSV or Text File?



zaphod2003
02-08-2008, 05:51 AM
:help

I have a macro which compiles CSV data received from an AS400.

Because the number of records can be huge (400,000 + lines). I used a progress bar to display how long the process is likely to take.

To find out the number of records I use this routine:


Open sFileName For Input As #1
Do While Not EOF(1)
Line Input #1, sLineOfText
recordCounter = recordCounter + 1
Loop
Close


Which is fine for small files, but causes a considerable delay in processing when the number of records reach huge amounts.

Is there a way I could find out the approximate or exact number of records inside the CSV without having to literally count the number of lines first?

xld
02-08-2008, 06:07 AM
Dim wb As Workbook
Dim RowCount As Long

Set wb = Workbooks.Open(sFilename)
With wb.Worksheets(1)

RowCount = .Cells(.Rows.Count, "A").End(xlUp).Row
End With

T-J
02-08-2008, 04:05 PM
Set wb = Workbooks.Open(sFilename) Doesn't this just fill a sheet up to the maximum rows (65536 for Excel 2003), and ignore anything over that?

I would look at using ADO and treating the csv file as a database.

The code below reports the number of records. Remember to set a reference to Microsoft ActiveX Data Objects Library (Tools>References...)

Sub ADOReadCSVFile()
Const CSV_FILE As String = "TESTFILE.txt"
Dim strPathtoTextFile As String
'set reference to Microsoft ActiveX Data Objects Library (Tools>References...)
Dim objConnection As ADODB.Connection
Dim objRecordset As ADODB.Recordset

Set objConnection = New ADODB.Connection
Set objRecordset = New ADODB.Recordset
strPathtoTextFile = ThisWorkbook.Path & "\"

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & strPathtoTextFile & ";" & _
"Extended Properties=""text;HDR=No;FMT=CSVDelimited"""

objRecordset.Open "SELECT * FROM " & CSV_FILE, objConnection, adOpenStatic, adLockOptimistic, adCmdText

If Not objRecordset.EOF Then MsgBox objRecordset.RecordCount

objRecordset.Close 'Close ADO objects
objConnection.Close

Set objRecordset = Nothing
Set objConnection = Nothing
End Sub Notice how the path to the csv file is specified in the connection string and the name of the csv file itself is part of the SQL statement.
You can do a lot more than this with ADO, e.g filtering records "SELECT * FROM CSV_FILE WHERE Length = '0'" and other queries on the data.

CopyFromRecordset can handle copying recordsets on to multiple sheets.

xld
02-08-2008, 05:09 PM
No, it just opens the file then quickly counts the number of lines.

You should close it immediately afterwards.

T-J
02-09-2008, 02:42 AM
I tried your code on a text file with 220,000 lines and it returns a RowCount of 1.

My code returns the correct number.

xld
02-09-2008, 03:34 AM
Well I tried it with a file of 396,000 rows and it returned 396,000. Note that it looks for data in column A, if it is some other then adjust the code.