PDA

View Full Version : Urgent :How to FAST write large recordset in to a text file?



agarwaldvk
04-16-2012, 03:22 AM
Hi Everybody


I have a large record set (about 3.5 - 4 million records with 4 fields in each record). I want to write this to a text file. The resulting text file is approximately just under 200 mB file.

I tried to write record by record by string concatenating the fields and then using the Print # statement to write to a text file. Whilst it doesn't take awfully long time, it does take a little while to do so. I was wondering if there is a faster way to do the same thing.

The other option that I am looking at (it works but haven't time tested for the same data set yet) is to use the GetString() function with the appropriate arguments for comma delimited columns and vbCrLf delimited rows and then store the whole record set in to a single string and then use the Print# statement to write this string to a text file. This works but I have to time test it - I am looking at doing that tomorrow at work.

I am wondering if "TextStream.Write (string)" would be faster where 'string' would be the result from the GetString() function. Or is this much of a muchness in that I wouldn't be gaining a lot ether way?
Any suggestions on this would be highly valued.

Thanks in advance.



Best regards



Deepak

Kenneth Hobs
04-16-2012, 06:10 AM
ADO would probably be faster. See: http://www.erlandsendata.no/english/index.php?t=envbadac

snb
04-16-2012, 06:34 AM
sub snb
' reference to Microsoft ActiveX DataObject 2.0 Library

With New Recordset
.Open "SELECT * FROM `Sheet1$`;", "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" & ThisWorkbook.FullName & "';Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
CreateObject("scripting.filesystemobject").CreateTextFile("G:\OF\snb.txt").Write .GetString(, , "|", vbCrlf)
End With
end sub

agarwaldvk
04-16-2012, 02:02 PM
Hi Kenneth
Sorry but the link that you suggested only works with data in an Excel spreadsheet. My data that I want to write to a text file is in a recordset and not in an Excel worksheet - it anyway won't fit in a worksheet, it contains around 4 million records


Hi Snb
That was exactly what I have tried and it works. I was wondering if 'Textstream.Write String' would be any faster.


I am going to time test the 'TextStream.Write String' now and see if its any faster.

The other option is to get is to load the recordset in an array using GetRows() function but I don't know of any function that will dump the contents of the entire array in to a text file without having to go field by field and record by record.


Any further suggestions?



Deepak

Kenneth Hobs
04-16-2012, 02:16 PM
SNB is very good at coming up with short code examples.

Have you tried just saving it as a CSV file? You can copy the sheet to one workbook and save that one.

ADO is far more advanced that just reading the data from Excel. It can read and write data from various ADO sources. See ConnectionStrings.com. You just have to figure out the details. I probably won't have time to dig much into it until this weekend.

I would imagine that you would want to read the Excel data and then update a CSV file. To do this, you need one step to read the data and another to add it to another data set which in this case is the CSV file. It is not that much different than one MDB table to another table. The concepts are similar.


Here is an example going from a CSV to Excel. Of course reading the Excel data can be an easy step.

Option Explicit

Sub pull_data()
Dim s1 As Worksheet, ML_Dir As String
ML_Dir = ThisWorkbook.Path
'Open_Sort_CSV ML_Dir, "ImportCSV.csv", ActiveSheet.Name, True, "Data Used"
Open_Sort_CSV ML_Dir, "ImportCSV.csv", ActiveSheet.Name, False, "F1"
End Sub

'Add Tools > References... > Microsoft ActiveX Data Objects 2.8 Library
Sub Open_Sort_CSV(CSV_Dir, CSV_name, Data_Sheet, Optional Header As Boolean = True, _
Optional SortField As String = "", Optional SortASC As Boolean = True)

Dim connectionString As String
'Late binding:
'Dim objConnection As Object, objRecordset As Object
' Early Binding:
Dim objConnection As Connection, objrecordset As Recordset
Dim A As Integer
Dim Location As Range, Rw As Long, col As Integer, c As Integer, MyField As Variant

'set record set variables
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = 1 '&H1

'set connection and recordset
Set objConnection = CreateObject("ADODB.Connection")
Set objrecordset = CreateObject("ADODB.Recordset")

'oopen connection (headers,Delimited style,mixed data taken as text(not sure imex works))
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & CSV_Dir & ";" & _
"Extended Properties=""text;HDR=" & Header & ";FMT=Delimited;IMEX=3"""
objConnection.Open connectionString

'get data from csv
Select Case True
Case SortField = ""
objrecordset.Open "SELECT * FROM " & CSV_name, _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
Case SortField <> "" And SortASC
objrecordset.Open "SELECT * FROM " & CSV_name & " Order By `" & SortField & "` ASC", _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
Case SortField <> "" And SortASC = False
objrecordset.Open "SELECT * FROM " & CSV_name & " Order By `" & SortField & "` DESC", _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
End Select

'Loop across the fields
If Header Then
With objrecordset
For A = 0 To .Fields.Count - 1
' Add field names to data sheet
If Right(.Fields(A).Name, 7) <> ".NoName" Then _
ThisWorkbook.Worksheets(Data_Sheet).Cells(1, 1).Offset(0, A).Value = .Fields(A).Name
Next A
End With
'copy data into worksheet under headers
ThisWorkbook.Worksheets(Data_Sheet).Cells(2, 1).CopyFromRecordset objrecordset
'Write RecordSet to results area
Set Location = ThisWorkbook.Worksheets(Data_Sheet).Range("A2")
Rw = Location.Row
col = Location.Column
c = col
With objrecordset
Do Until .EOF
For Each MyField In .Fields
Cells(Rw, c) = MyField
c = c + 1
Next MyField
.MoveNext
Rw = Rw + 1
c = col
Loop
End With


Else
'copy data into worksheet
ThisWorkbook.Worksheets(Data_Sheet).Cells(1, 1).CopyFromRecordset objrecordset
End If

'end connection and recordset
Set objConnection = Nothing
Set objrecordset = Nothing

End Sub


'Pulls Data from CSV to Data sheet
Sub Open_Sort_CSV_o(CSV_Dir, CSV_name, Data_Sheet, Optional Header As String = "No")

Dim connectionString As String, objConnection As Object, objrecordset As Object
Dim A As Integer
Dim Location As Range, Rw As Long, col As Integer, c As Integer, MyField As Variant

'set record set variables
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1

'ser connection and recordset
Set objConnection = CreateObject("ADODB.Connection")
Set objrecordset = CreateObject("ADODB.Recordset")

'oopen connection (headers,Delimited style,mixed data taken as text(not sure imex works))
'connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & CSV_Dir & ";" & _
"Extended Properties=""text;HDR=" & Header & ";FMT=Delimited;IMEX=1"""
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & CSV_Dir & ";" & _
"Extended Properties=""text;HDR=" & Header & ";FMT=Delimited(,);"""
objConnection.Open connectionString

'get data from csv
objrecordset.Open "SELECT * FROM " & CSV_name, _
objConnection, adOpenStatic, adLockOptimistic, adCmdText

'Loop across the fields
If Header = "Yes" Then
With objrecordset
For A = 0 To .Fields.Count - 1

' Add field names to data sheet
ThisWorkbook.Worksheets(Data_Sheet).Cells(1, 1).Offset(0, A).Value = .Fields(A).Name
Next A
End With

'this errors for no good reason so stop any errors
'On Error Resume Next

'copy data into worksheet under headers
ThisWorkbook.Worksheets(Data_Sheet).Cells(2, 1).CopyFromRecordset objrecordset
'Write RecordSet to results area


Else
'copy data into worksheet
ThisWorkbook.Worksheets(Data_Sheet).Cells(1, 1).CopyFromRecordset objrecordset
End If

'end connection and recordset
Set objConnection = Nothing
Set objrecordset = Nothing

End Sub




Here is an example for getrows.

'romperstomper, http://www.excelforum.com/excel-programming/665066-send-results-of-sql-query-to-an-array.html
Sub GetData()
' Sample demonstrating how to return a recordset from a workbook
Dim cn As ADODB.Connection, strQuery As String, rst As ADODB.RecordSet, strConn As String
Dim varData As Variant
Set cn = New ADODB.Connection
' strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & ActiveWorkbook.FullName & ";" & _
' "Extended Properties=""Excel 8.0;HDR=Yes;"""
With cn
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source=" & ActiveWorkbook.FullName & ";" & _
"Extended Properties=""Excel 8.0;HDR=Yes;"""
.Open
End With
strQuery = "SELECT * FROM [Sheet1$];"
Set rst = New ADODB.RecordSet
rst.Open strQuery, strConn, adOpenStatic, adLockReadOnly, adCmdText
' dump array of data into variable
varData = rst.GetRows
rst.Close
Set rst = Nothing
' cn.Close
Set cn = Nothing
End Sub

agarwaldvk
04-16-2012, 02:32 PM
Hi Kenneth


Have we got our question mixed up? My understanding is that .copyfromrecordset copies recordset content to an Excel worksheet. This is not something that I want. I want the contents of the recordset to be dumped to a text file (.txt file).

Also, I had a look at the ConnectionStrings.com. I take it that I may need to use the Standard Microsoft Text ODBC drive like so :-

Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=c:\txtFilesFolder\;Extensions=asc,csv,tab,txt;

Here, is the Dbq the path to the output .txt file? Where do I specify the field/row delimiters?




Deepak

Kenneth Hobs
04-17-2012, 06:24 AM
The field row delimiters are by your options. I would have to look into how those can be changed via VBA.

The CopyFromRecordSet is for a Range object. I showed how to use the GetRows.

Here is the saveas method.


Sub SaveAsCSV()
Dim csv As String, wb1Name As String, wb1 As Workbook, wba As Workbook

Set wba = ThisWorkbook
csv = wba.Path & "\csv.csv"
wb1Name = wba.Path & "\" & Environ("username") & ".xlsm"

If Dir(csv) <> "" Then Kill csv
If Dir(wb1Name) <> "" Then Kill wb1Name

wba.Save
wba.SaveCopyAs wb1Name
Set wb1 = Workbooks.Open(wb1Name)
wb1.Worksheets("Sheet1").SaveAs csv, xlCSVWindows
ActiveWorkbook.Close True
Kill wb1Name
End Sub

snb
04-17-2012, 09:09 AM
@Kenneth

alternative ?

Sub SaveAsCSV()

ThisWorkbook.sheets("sheet1").copy
With Activeworkbook
.SaveAs thisworkbook.path & "\" & Environ("username") & ".csv", xlCSVWindows
.close false
end with
End Sub



This forum is messing up the formatting of the code !!