PDA

View Full Version : Solved: Text file data import issue



jtrowbridge
12-13-2007, 12:22 PM
I'm having performance issues when I'm reading from a particularly large text file (14Mb). When the program gets to the Line Input #1 section of the code it takes VB about 45 seconds to access the text file and figure out what line it is currently reading from.


The data in the txt is comma delimited and structured like so:

String_A, An_Integer, A_Double
String_B, An_Integer, A_Double
String_C, An_Integer, A_Double
...

All I'm trying to read is the string before the first comma, so that's why the InStr function comes into play and why I'm using 'Line Input'.


Does anyone know of a more efficient way to read from a large set of data? The only kind of file reading I know how to use is the 'Open For Input' function.


Here's the code:
---------------------------

Dim Data_Line As String
Dim File_Name As String
Dim Curve_Name() As String
Dim FirstCommaPosition As Integer

File_Name = "C:\test\CurveSummary.txt"
n = 0
ReDim Curve_Name(n)

Open File_Name For Input As #1

Do While Not EOF(1)

Line Input #1, Data_Line 'Takes way to long to access here...

FirstCommaPosition = InStr(Data_Line, Chr(44))
Curve_Name(n) = Left(Data_Line, FirstCommaPosition - 1)

n = n + 1

ReDim Preserve Curve_Name(n)

Loop

Close #1

--------------------------

Thanks in advance!

:doh:

Dr.K
12-13-2007, 12:29 PM
Using ReDim Preserve on a large array is a CPU killer, specially since you ReDim for every single element. That alone could be your problem.

You might be better off parsing the text into a worksheet first, so you can count it, and use only one redim.

I always use the FSO text stream object, so I can't help you with the Line Input part.

jtrowbridge
12-13-2007, 01:31 PM
I cant import the text file into excel becuase it's over 100k lines long. After about three hours of research I figured out how to do it. I never knew about the TextStream method until i really had to dive into things. I can tell this is obviously way more efficient than the original method i was using.

Here's the code:

-----------------------------

Dim Data_Line As String
Dim File_Name As String
Dim Curve_Name() As String
Dim FirstCommaPosition As Integer

ReDim Preserve Curve_Name(0)
File_Name = "C:\test\CurveSummary.txt"

Const ForReading = 1, ForWriting = 2
Dim fso, MyFile, Contents
Set fso = CreateObject("Scripting.FileSystemObject")
Set FileObj = fso.OpenTextFile(File_Name, ForReading)
ReadLineTextFile = FileObj.ReadLine
FirstCommaPosition = InStr(ReadLineTextFile, Chr(44))
Curve_Name(0) = Left(ReadLineTextFile, FirstCommaPosition - 1)
n = 1
ReDim Preserve Curve_Name(n)
While Not FileObj.AtEndOfStream

ReadLineTextFile = FileObj.ReadLine

FirstCommaPosition = InStr(ReadLineTextFile, Chr(44))

Curve_Name(n) = Left(ReadLineTextFile, FirstCommaPosition - 1)

n = n + 1

ReDim Preserve Curve_Name(n)

Wend
FileObj.Close

-------------------------

:rofl:

Dr.K
12-13-2007, 01:40 PM
Ah, nicely done. I love the FSO and TS objects, they make a lot more sense to me then the built in VBA text methods. AND FAST, TOO!
Here is the tutorial I used to learn to the FSO:

http://www.techbookreport.com/tutorials/fso1.html



I cant import the text file into excel becuase it's over 100k lines long.

Lies, LIES I TELL YOU! :devil2:

You can always pull in the data 60,000 lines at at time and use multiple worksheets. Bill Jelen (Mr.Excel) has a good routine for doing this. Or, if your array is 1D, just do 60,000 lines at a time and put them into multiple columns on the same sheet. Then again, it all depends on exactly WHAT you are going to do with your data.

As for redimming every time, if you know roughly how big your array will be, just dim it once at 25% oversize, and keep count of the elements you add. Then just redim it once at the end, or never if the blank elements at the end don't bug you.



Edit:
"Wend"? Damn, you must be old school...