PDA

View Full Version : Speeding up search though a csv file



grazzt
08-16-2006, 10:26 AM
I am new here and look to get some help to speed up a search through a .csv data file. My code is below. Basically I have an excel file that I search through cells 1 through n for a name in no particular order. I then match it to corresponding row in the .csv file and pull part of the discription out of that line. But I want to speed it up some. I have 30,000+ lines of data in the spreadsheet that I have to go through.
Some questions:
I shouldnt have to go to then of file everytime, how would I end it faster
I should be able to jump through the file, but is set up as sequential access, not random, how can I do this?

Sub subGet_Table_Types()
Dim strTable As String
Dim sngBig As Variant
Dim strGame As String
Dim intSeat As Integer
Dim strType As String
Dim strV1 As String
Dim strV2 As String
Dim strTable2 As String
Dim strNotes As String
Dim i As Integer
i = 0
Do
i = i + 1
strTable2 = Cells(i, 8)
strNotes = Cells(i, 1)
Open "E:\my_files\transactions\tables.csv" For Input As #1 ' Open file for input.
Do While Not EOF(1) ' Loop until end of file.
Input #1, strTable, sngBig, strGame, intSeat, strType, strV1, strV2 ' Read data into two variables.
If strTable = strTable2 Then
Cells(i, 9) = strGame
End If
Loop
Close #1 ' Close file.
Loop Until strNotes = ""
End Sub

mdmackillop
08-16-2006, 10:59 AM
Hi Grazzt
Welcome to VBAX
I don't see in your code where you're searching Excel and the result.
Can you post some sample data and your whole code? Use Manage Attachments in the Go Advanced section
Regards
MD

mdmackillop
08-16-2006, 11:59 AM
I tried this searching 30K lines of csv to return the line containing the search value. About 2.5 seconds

Option Explicit
Sub GetData()
Const ForReading = 1
Const TristateUseDefault = -2
Dim MyString As String
Dim fs, f, ts
Dim s As String
Dim i As Long, k As Long, x As Long
MyString = "Data299915"
Set fs = CreateObject("Scripting.FileSystemObject")
Set f = fs.GetFile("C:\AAA\MyData.csv")
Set ts = f.OpenAsTextStream(ForReading, TristateUseDefault)
s = ts.ReadAll
i = InStr(1, s, MyString)
For x = 1 To Len(s)
If Mid(s, x, 1) = vbCr Then
k = k + 1
If InStr(x, s, vbCr) > i Then
Exit For
End If
End If
Next x

MsgBox Split(s, vbCr)(k - 1)
ts.Close
End Sub

mdmackillop
08-16-2006, 12:18 PM
A bit quicker; about 0.5 sec

s = ts.ReadAll
i = InStr(1, s, MyString)
j = 0
For x = 1 To UBound(Split(s, vbCr))
j = InStr(j + 1, s, vbCr)
If j > i Then
Exit For
End If
k = k + 1
Next x
MsgBox Split(s, vbCr)(k)
ts.Close

grazzt
08-16-2006, 12:55 PM
Thanks for your help I will try and implement the code


As for the data it looks like this

Column A
Jon
Mary
Harry
.
.
.
line 35,000

I will search each cell in column A and then run the other loop through the .csv to find the name and then diplay a value next to the name in the spreadsheet to say something like column 2 in .csv. It takes a long time on my machine. The original setup was the data in .csv file was in a separate sheet in the excel file. By putting in the .csv, in decreased the time by ~10 fold. But a greedy to reduce the time further.


.csv setup
Jon, 35, Short, ...
Harry, 23, Med, ...
Marry, 89, Tall, ...
.
.
.
Line 1000

mdmackillop
08-16-2006, 01:10 PM
You can use the Split function with a comma separator on a returned line to give the Column item

grazzt
08-16-2006, 01:15 PM
How do you hand multiple columns of data in the .csv file in your example?

mdmackillop
08-16-2006, 02:10 PM
Here's an example

grazzt
08-16-2006, 04:37 PM
I ran the 2 codes.

Mine 25 sec
Yours 20 sec

PC: Celeron 2.8 GHz, 196 meg ram

I have attached the table and file

mdmackillop
08-17-2006, 12:08 AM
What does your code do, and how should I run it?

grazzt
08-17-2006, 05:30 AM
Run the get subGetData (yours) and subGet_Table_Types (mine) and place the Table.csv in an appropriate folder on your dirve and change the path in the code. Basicaly the code will run through Column H and match it to the same name in the .csv file and then place data in a Column I and J for each of the above SUBs.

mdmackillop
08-17-2006, 06:10 AM
Having seen your data, I would approach it differently.

Sub NewGetData()
Cells(2, 11) = Now
'Turn off screen refreshing
Application.ScreenUpdating = False
'Open the CSV file
Workbooks.Open Filename:="E:\my_files\transactions\tables.csv"
'Return to the original workbook
Windows("vba_project.xls").Activate
'Write a vlookup formula into all cells of the range described
Range("I3:I6000").FormulaR1C1 = "=VLOOKUP(RC[-1],tables.csv!C1:C3,3,FALSE)"
'Copy and PasteSpecial to remove formulae
With Columns("I:I")
.Copy
.PasteSpecial Paste:=xlPasteValues
'Remove N/A values from cells where there was no corresponging value found
.SpecialCells(xlCellTypeConstants, 16).ClearContents
End With
'Close the CSV file
Workbooks("Tables.csv").Close
'Turn on screen refreshing
Application.ScreenUpdating = True
Cells(2, 12) = Now
[I1].Select
End Sub

grazzt
08-17-2006, 09:45 AM
That is very impressive. On a different machine the times were 19 sec and 5 sec (for the new code). This might be asking a lot, but could you put some comments within the code for what the code is doing. I understand most but not all. I think I will use a similar approach to other items in the code as well. 5000 line is pretty easy go through but 40,000 now is a lot more difficult.

mdmackillop
08-17-2006, 11:11 AM
Comments added.
For 40k rows, just change the range. I can't thing of a quicker way to achieve this, (although otherrs may!)