PDA

View Full Version : [SOLVED] handling .txt files with 4mio lines (610 mb)!



dormanino
05-26-2015, 10:17 AM
Hello dear reader, good day

I have a .txt file with 4mio lines and growing.

From this file, I extracted strings that I consider registers or key data and need to re-cicle them through the same file to gather another info...(screenshots).

13518

I bump with this code´s and adapted it to my needs...it´s absolutely fail proof...but slow as hell since excel is non-relational data handling...and also string handling is an interesting limiting issue.

and that is the result so far afetr 1 hour...
13519

Can anyone send me to a special/new/better direction...ADODB, array´s, kill myself...??


Sub b3902v1()
Dim strLine As String
Dim rcount As Long
Dim ccount As Long
Dim last_row As Long
Dim i As Long
Dim rng As range
Dim rng2 As range
rcount = 2
ccount = 2
i = 1
Dim b3902v As Worksheet
Set b3902v = Sheets("b3902v")
With b3902v
If WorksheetFunction.CountA(cells) > 0 Then
last_row = .cells.Find(What:="*", After:=[A1], SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
End If
Set rng2 = .range(cells(2, 1), cells(last_row, 1))
For Each rng In rng2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Users\vravagn\Desktop\macros\B3902V.TXT")

DoEvents
Do Until objFile.AtEndOfStream
DoEvents
Debug.Print objFile.line
strLine = objFile.ReadLine

Debug.Print Trim(Mid(strLine, 4, 21))
Debug.Print rng
If rng = Trim(Mid(strLine, 4, 21)) Then
.cells(rcount, ccount) = Trim(Mid(strLine, 63, 21))
ccount = ccount + 1
End If
Loop
rcount = rcount + 1
ccount = 2
objFile.Close
Next rng
End With
End Sub

Kenneth Hobs
05-26-2015, 01:14 PM
Try zipping and attaching short example text and Excel files for us to best help you. Be sure to obfuscate any sensitive data.

Filter could be used for arrays once it creates a unique array. It looks like your data is already sorted.

ADO could be an option too.

A conceptional change is needed to speed it up but some of my speed tips might help. http://vbaexpress.com/kb/getarticle.php?kb_id=1035

mancubus
05-26-2015, 01:30 PM
why are you using so many "Debug.Print"s?

if you are familiar with Access (2GB dbase limit) or SQL Server (10GB limit with express, free, edition) consider moving to these programs.

dormanino
05-27-2015, 04:09 AM
why are you using so many "Debug.Print"s?

Hi mancubus...That is actually my debug code...haven´t time to perfect it.

dormanino
05-27-2015, 04:12 AM
Try zipping and attaching short example text and Excel files for us to best help you. Be sure to obfuscate any sensitive data.

Filter could be used for arrays once it creates a unique array. It looks like your data is already sorted.

ADO could be an option too.

A conceptional change is needed to speed it up but some of my speed tips might help.

Thank you Mr. Hobs, the article will help. As soon I come to a adequate solution i´ll repost it...

dormanino
05-27-2015, 04:26 PM
just to post the solution...

First, feeded the txt file into Access with import wizard. It generates a routine that can be used in a vba routine...or use something similar to this


Dim F As Long, sLine As String, A(0 To 4) As String
Dim db As Database, rs As Recordset
F = FreeFile
Open "c:\test.txt" For Input As F
' uncomment following line if you want to skip field headings
' Line Input #F, sLine
Set db = CurrentDb ' Access only
Set db = DBEngine(0).OpenDatabase("biblio.mdb") ' Visual Basic
On Error Resume Next
db.Execute "DROP TABLE TestImport"
On Error Goto 0
db.Execute "CREATE TABLE TestImport (ID LONG, [Desc] TEXT (50), " _
& "Qty LONG, Cost CURRENCY, OrdDate DATETIME)"
Set rs = db.OpenRecordset("TestImport", dbOpenTable)
Do While Not EOF(F)
Line Input #F, sLine
ParseToArray sLine, A()
rs.AddNew
rs(0) = Val(A(0))
rs(1) = A(1)
rs(2) = Val(A(2))
rs(3) = Val(A(3))
rs(4) = CDate(A(4))
rs.Update
Loop
rs.Close
db.Close
Close #F
End Sub

Sub ParseToArray(sLine As String, A() As String)
Dim P As Long, LastPos As Long, I As Long
P = InStr(sLine, ",")
Do While P
A(I) = Mid$(sLine, LastPos + 1, P - LastPos - 1)
LastPos = P
I = I + 1
P = InStr(LastPos + 1, sLine, ",", vbBinaryCompare)
Loop
A(I) = Mid$(sLine, LastPos + 1)
End Sub

After that, just coded the following...and voila...not the best resource...not very inteligent...but solved the issue for now...obviously I'll continue the search to a better solution.


Sub AccessDAO_FindMethod()

'To use DAO in your Excel VBA project, you must add a reference to the "Microsoft Office 14.0 Access Database engine Object Library" n Excel (your host application) by clicking Tools-References in VBE.

Dim strMyPath As String, strDBName As String, strDB As String
Dim daoDB As DAO.Database
Dim recSet As DAO.Recordset
Dim rcount As Long
Dim ccount As Integer
Dim b3902v As Worksheet
Dim rng As range
Dim rng2 As range
rcount = 2
ccount = 2
Set b3902v = Sheets("b3902v")
With b3902v

If WorksheetFunction.CountA(cells) > 0 Then last_row = .cells.Find(What:="*", After:=[A1], SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
Set rng2 = .range(cells(2, 1), cells(last_row, 1))
'---------------
'your data source with which to establish connection - ENTER the MS Access Database Name:
strDBName = "B3902V.accdb"
'get path / location of the database, presumed to be in the same location as the host workbook:
strMyPath = ThisWorkbook.Path
'set the string variable to the Database:
strDB = strMyPath & "\" & strDBName
'assign the database reference to an object variable:
Set daoDB = DBEngine.Workspaces(0).OpenDatabase(strDB)
'Open a dynaset-type recordset based on a MS Access Table named "B3902V":
Set recSet = daoDB.OpenRecordset("B3902V", dbOpenDynaset)
'----------------
'locate first record with matching criteria:

For Each rng In rng2
DoEvents
Do While rng = recSet.Fields("Variante")
.cells(rcount, ccount).Value = recSet.Fields("Code")
ccount = ccount + 1
recSet.Move 1 'recSet.Move 1 move para próxima linha na coluna do BD
Loop
rcount = rcount + 1
ccount = 2
Next rng
End With
recSet.Close
daoDB.Close
'destroy the variables:
Set daoDB = Nothing
Set recSet = Nothing
End Sub

SamT
05-27-2015, 05:26 PM
Can anyone send me to a special/new/better direction...ADODB, array´s, kill myself...??
please, not the third option. :D

This code only loops thru the file and the range once, both at the same time. I won't guarantee there are no errors in it but... try it and see.


Sub b3902v1()
Dim strLine As String
Dim rcount As Long
Dim ccount As Long
Dim last_row As Long
Dim i As Long
Dim rng As Range
Dim rng2 As Range
rcount = 2
ccount = 2
i = 1
Dim b3902v As Worksheet
Set b3902v = Sheets("b3902v")
With b3902v
last_row = .Cells(Rows.Count, 1).End(xlUp).Row
If last_row < 2 Then
MsgBox "Error"
Exit Sub
End If
Application.ScreenUpdating = False


Set objFile = CreateObject("Scripting.FileSystemObject"). _
OpenTextFile("C:\Users\vravagn\Desktop\macros\B3902V.TXT")

DoEvents
strLine = Trim(Mid(objFile.ReadLine, 4, 21))
Do Until objFile.AtEndOfStream
DoEvents 'I'm not sure what to do with this. Maybe after the next strLine assignment
Do While strLine = rng
.Cells(rcount, ccount) = Trim(Mid(strLine, 63, 21))
ccount = ccount + 1
strLine = Trim(Mid(objFile.ReadLine, 4, 21))
Loop
Set rng = rng.Offset(1, 0)
rcount = rcount + 1 'Maybe DoEvents before this line?
ccount = 2
Loop
objFile.Close

Application.ScreenUpdating = True

End With
End Sub

dormanino
05-30-2015, 07:51 AM
please, not the third option. :D

This code only loops thru the file and the range once, both at the same time. I won't guarantee there are no errors in it but... try it and see.

Hey SamT, thank you for the code lines but since there are more than 4mio lines into the txt files to the ranges in the spreadsheet to check (for validity of the strings so to speak), it´s taking too long for the data analisys.