Consulting

Results 1 to 16 of 16

Thread: There has to be a faster way.

  1. #1

    SOLVED There has to be a faster way.

    So I recently wrote my first VBA script which pulls in a list of text files, each containing a coma delimited file which contains and ID# and a corresponding Z value.

    ex.
    1,20
    2,19
    3,17
    4,20
    5,22
    ...

    The script pulls the files into excel, one by one, and then iterates through the coma delimited file to calculate the 97th and 3rd percentile. The script calculates percentile for the entire text file, then kicks the min and max value to a text tile. Here is the kicker... each individual text files has somewhere between 3000 and 50000 entries. The smaller ones dont take so long, but the 50000 entry text files can take upwards to 45 mins. I calculated the total time to run this on all my files, upwards of 4000 text files, and yeah... it'll be done next week. I know in python you can stream things to memory, but I have been having a heck of a time trying to figure out memory streaming in VBA.

    Sub run_Click()
    
    'Clear Contents
        Range("A2:B100000").Select
        Selection.ClearContents
    Dim i As Long, nd As Long, MaxBuilding As Integer, MinBuilding As Integer
    Dim Temp(100000) As Double, Dens(100000) As Double, File_name As String, inc As Long, j As Integer, volume As Long
    
    inc = 1
    For j = 0 To inc
            Do
                If Sheet1.Cells(inc, 17).Value = "" Then Exit Do
                File_name = Sheet1.Cells(inc, 17).Value
                'fetch data from the file
                Open File_name For Input As #1
                Do
                    If EOF(1) Then Exit Do
                    nd = nd + 1
                    Input #1, Temp(nd), Dens(nd)
                Loop
                Close #1
                'Display values on the worksheet
                Sheets("Export_Output1").Select
                
                Range("a2").Select
                For i = 0 To nd
                    ActiveCell.Value = Temp(i)
                    ActiveCell.Offset(0, 1).Select
                    ActiveCell.Value = Dens(i)
                    ActiveCell.Offset(1, -1).Select
                Next i
                nd = -1
                Erase Dens
                Erase Temp
                Range("a1").Select
                'create a file
                Open "c:\z_unit_write.txt" For Append As #2
                MaxBuilding = Range("h12")
                MinBuilding = Range("k12")
                volume = Range("m13")
                    Write #2, MaxBuilding, MinBuilding, volume, File_name
                Close #2
                'clear Contents
                    Range("A2:B100000").Select
                    Selection.ClearContents
                inc = inc + 1
            Loop
    Next j
    End Sub
    What is good about this script is that it iterates through a list of text files (in a range) until the range = "", something that took me forever to figure out how to do in a batch process. My question is, does anyone happen to know of a faster way to process all this data? Maybe instead of loading in each file and loading each coma delimited entry one at a time, you can set something in vba to load the entire list in at once?


    Thanks for any tips!

    aaron
    Last edited by aaronreyna; 09-11-2009 at 04:46 PM. Reason: SOLVED

  2. #2
    VBAX Tutor nst1107's Avatar
    Joined
    Nov 2008
    Location
    Monticello
    Posts
    245
    Location
    Not sure about streaming, either, but I can offer one tip for speeding up your procedure. Get rid of the Range.Select, Selection...., or ActiveCell.... manner of doing things. Using .Select is always the slow way. Here's a cleaner version of your code, minus the .Select and a few other things that create useless overhead. It's not much, but it should speed it up at least a little bit. Hope it helps.[VBA]Sub run_Click()
    'Clear Contents
    Range("A2:B100000").ClearContents
    Dim i As Long, nd As Long, MaxBuilding As Long, MinBuilding As Long
    Dim Temp(100000) As Double, Dens(100000) As Double, File_name As String, inc As Long, j As Long, volume As Long
    inc = 1
    For j = 0 To inc
    Do
    If Sheet1.Cells(inc, 17) = vbNullString Then Exit Do
    File_name = Sheet1.Cells(inc, 17)
    'fetch data from the file
    Open File_name For Input As #1
    Do
    If EOF(1) Then Exit Do
    nd = nd + 1
    Input #1, Temp(nd), Dens(nd)
    Loop
    Close #1
    'Display values on the worksheet
    With Sheets("Export_Output1").Cells(2, 1)
    For i = 0 To nd
    .Value = Temp(i)
    .Offset(0, 1).Value = Dens(i)
    Next i
    End With
    nd = -1
    Erase Dens
    Erase Temp
    'create a file
    Open "c:\z_unit_write.txt" For Append As #2
    MaxBuilding = Cells(12, 8)
    MinBuilding = Cells(12, 11)
    volume = Cells(13, 13)
    Write #2, MaxBuilding, MinBuilding, volume, File_name
    Close #2
    'clear Contents
    Range("A2:B100000").ClearContents
    inc = inc + 1
    Loop
    Next j
    End Sub
    [/VBA]
    Note the integers are all longs in this version, and most of your Range() are now Cells().

  3. #3
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,728
    Location
    Why are you reading the CSV a record at a time? You could load open it in Excel, call WS functions on the data, save it in a 'Summary WS', and write that sheet at the end.

    All that looping takes time

    If you post a Zip with the WB, and 2-3 SMALL csv files, it'd be easier to offer suggestions

    Paul

  4. #4
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    What probably takes the longest time is writing to the sheet.
    Try this:[vba]Sub run_Click()
    'Clear Contents:
    Sheets("Export_Output1").Range("A2:B100000").ClearContents
    Dim i As Long, nd As Long, MaxBuilding As Long, MinBuilding As Long
    Dim File_name As String, inc As Long, j As Long, volume As Long
    Dim TempDens() As Double
    ReDim TempDens(1 To 2, 1 To 1) 'ReDim TempDens(1 To 2, 0 To 0)
    inc = 1
    'For j = 0 To inc
    Do
    If Sheet1.Cells(inc, 17).Value = "" Then Exit Do
    File_name = Sheet1.Cells(inc, 17).Value
    'fetch data from the file:
    Open File_name For Input As #1
    Do
    If EOF(1) Then Exit Do
    nd = nd + 1
    ReDim Preserve TempDens(1 To 2, 1 To nd) 'ReDim Preserve TempDens(1 To 2, 1 To nd)
    Input #1, TempDens(1, nd), TempDens(2, nd)
    Loop
    Close #1
    'Display values on the worksheet
    With Sheets("Export_Output1")
    .Range("A2").Resize(nd, 2) = Application.Transpose(TempDens)
    nd = -1
    Erase TempDens
    'create a file
    Open "c:\z_unit_write.txt" For Append As #2
    MaxBuilding = .Range("h12")
    MinBuilding = .Range("k12")
    volume = .Range("m13")
    Write #2, MaxBuilding, MinBuilding, volume, File_name
    Close #2
    'clear Contents:
    .Range("A2:B100000").ClearContents
    End With
    inc = inc + 1
    Loop
    'Next j
    End Sub
    [/vba]You had 0,0 as the first values (maybe you have Option Base 1?), and I'm not sure whether you want them. The code above puts the first row of values at row 2. There are commented out lines as suggested replacements for my code to reinstate these zero values.

    I took out the j loop (I don't think it's necessary but you may have it there for your own reasons).
    I think you'll be very pleased with the speed; I tested with 10,000 values in the text file and some simple max/min/sum formulae in the three cells, it took 0.062 seconds (6.2 seconds to do it 100 times). (xp pro/excel 2003)

    ps. you say the script does the percentile stuff? Not in the code you supplied. Is it worksheet formulae which do this? I suspect it is, and if those formulae take any time at all to calculate, then your code, which writes values one a t a time to the sheet will have the sheet recalculating at each change. If (and this only applies to your code, not mine) you were to switch automatic calculation off, then calculate only once all the data was in place, that would be a route to speeding up your routine.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  5. #5
    Knowledge Base Approver VBAX Master Oorang's Avatar
    Joined
    Jan 2007
    Posts
    1,135
    Location
    Hi Aaron,
    Welcome to the board. As alluded to, a big part of your problem is the use of Range.Select and Range.Offset. But what I don't see mentioned is that you can also realize huge gains by turning of the interface. Events, Screen-Updating, and most importantly Auto-Calculation. In Excel, all of these things keep happening while your script is running unless you turn them off. Once I corrected the Select/Offset issue and turned off the interface I was able to get your code to do a 50,000 record file in about a second.

    However a few other things are worth noting. You seem to be calculating the percentile and volume from formulas in a cell. You don't really need to do this, you can use Excel.WorksheetFunction.Percentile to get a result directly.

    Secondly, you asked about memory. You can move an entire file into memory quite quickly (see the GetFileBytes procedure in code below) but it will be pulled in in byte form. Meaning a number will actually be a series of Unicode values. So 971 in memory would be spread across 6 elements as 57-0-55-0-49-0. And this is not even address extracting data from the second column in a non-fixed width file. Obviously this is sub-optimal.

    It's easier to pull the data in via a loop using the input method (similar to what you have already done) thus extracting the second column and allowing passive type coercion to convert the bytes to numeric values.

    That said you still have the issue of calculating the Percentile in Memory. This calls for the implementation of a sort algorithm and results a quote a lot of code. However the end result is faster by about half a a second. So if you really want to you can.

    I have set up a few examples for you to look over, along with a test harness to see the results. I wasn't really sure what you meant by "Volume" so I just assumed you meant "count".

    In my "professional opinion" unless you are processing thousands of files, 1 second per file is adequate, and while a half a second is twice as fast it doesn't create enough productivity gains to justify bloating the code base like that. (See also: "Premature optimization is the root of all evil.")

    [vba]Option Explicit

    Private Declare Function GetTickCount Lib "Kernel32.dll" () As Long
    Private Declare Sub CopyMemory Lib "Kernel32.dll" Alias "RtlMoveMemory" (ByRef _
    Destination As Any, ByRef Source As Any, ByVal Length As Long)

    Public Sub TestHarness()
    Const strPath_c As String = "C:\Test\test.csv"
    Const lngRecords_c As Long = 50000
    Dim lngStartVal As Long
    Dim lngEndVal As Long
    Dim ws As Excel.Worksheet
    Dim dblData() As Double
    Dim strOutputPath As String
    On Error GoTo Err_Hnd
    ToggleInterface False

    GenerateTestFile strPath_c, lngRecords_c

    strOutputPath = Left$(strPath_c, InStrRev(strPath_c, "\")) & _
    "z_unit_write.txt"

    'Demonstrate problems with working with memory:
    'MsgBox ByteArrayToString(GetFileBytes(strPath_c), 25)

    'Try corrected original method:
    lngStartVal = GetTickCount
    Set ws = Sheet1
    ws.UsedRange.Delete
    OriginalMethod strPath_c, ws
    With Excel.WorksheetFunction
    OutputResults strOutputPath, .percentile(ws.Columns(2), 0.97), _
    .percentile(ws.Columns(2), 0.03), .Count(ws.Columns(2))
    End With
    lngEndVal = GetTickCount
    Debug.Print lngEndVal - lngStartVal, "Original Input Method"

    'Do import method:
    lngStartVal = GetTickCount
    Set ws = ImportMethod(strPath_c)
    OriginalMethod strPath_c, ws
    With Excel.WorksheetFunction
    OutputResults strOutputPath, .percentile(ws.Columns(1), 0.97), _
    .percentile(ws.Columns(1), 0.03), .Count(ws.Columns(1))
    End With
    ws.Parent.Close False
    lngEndVal = GetTickCount
    Debug.Print lngEndVal - lngStartVal, "New Import Method"

    'Try In Memory Method
    lngStartVal = GetTickCount
    dblData = GetData(strPath_c)
    OutputResults strOutputPath, ArrayPercentile(dblData, 0.97), _
    ArrayPercentile(dblData, 0.03), UBound(dblData) + 1&
    lngEndVal = GetTickCount
    Debug.Print lngEndVal - lngStartVal, "Memory Method"

    Exit_Proc:
    On Error Resume Next
    ToggleInterface True
    Exit Sub
    Err_Hnd:
    MsgBox Err.Description, vbCritical + vbMsgBoxHelpButton, Err.Number, _
    Err.HelpFile, Err.HelpContext
    Resume Exit_Proc
    End Sub

    Public Sub GenerateTestFile(ByVal path As String, ByVal records As Long)
    Dim lngFileNum As Long
    Dim bytHeader(1) As Byte
    lngFileNum = FreeFile
    Open path For Binary Access Write As lngFileNum
    Put lngFileNum, , bytHeader
    Close lngFileNum
    Open path For Binary Access Write As lngFileNum
    Put lngFileNum, , CreateTempValues(records)
    Close lngFileNum
    End Sub

    Private Function CreateTempValues(ByVal records As Long) As Byte()
    Const lngRecordWidth_c As Long = 14&
    Const bytByte0Unicode_c As Byte = 255
    Const bytByte1Unicode_c As Byte = 254
    Const bytComma_c As Byte = 44
    Const bytVBCR_c As Byte = 13
    Const bytVBLF_c As Byte = 10
    Dim lngIndex As Long
    Dim lngUpperBnd As Long
    Dim bytRtnVal() As Byte
    Randomize
    lngUpperBnd = records * lngRecordWidth_c + 1&
    ReDim bytRtnVal(lngUpperBnd) As Byte
    'Add Header:
    bytRtnVal(0) = bytByte0Unicode_c
    bytRtnVal(1) = bytByte1Unicode_c
    For lngIndex = 2& To lngUpperBnd Step lngRecordWidth_c
    bytRtnVal(lngIndex) = RandBetween(vbKey0, vbKey9)
    bytRtnVal(lngIndex + 2&) = RandBetween(vbKey0, vbKey9)
    bytRtnVal(lngIndex + 4&) = bytComma_c
    bytRtnVal(lngIndex + 6&) = RandBetween(vbKey0, vbKey9)
    bytRtnVal(lngIndex + 8&) = RandBetween(vbKey0, vbKey9)
    bytRtnVal(lngIndex + 10&) = bytVBCR_c
    bytRtnVal(lngIndex + 12&) = bytVBLF_c
    Next
    CreateTempValues = bytRtnVal
    End Function

    Private Function RandBetween(ByVal lowerBound As Byte, ByVal upperBound As _
    Byte) As Byte
    RandBetween = CByte((upperBound - lowerBound) * Rnd + lowerBound)
    End Function

    Public Sub OriginalMethod(ByVal path As String, ByVal ws As Excel.Worksheet)
    Dim nd As Long, inc As Long, i As Long
    Dim temp(100000) As Double, Dens(100000) As Double
    Dim lngFileNum As Long
    lngFileNum = FreeFile
    'fetch data from the file
    Open path For Input As lngFileNum
    Do
    If EOF(lngFileNum) Then Exit Do
    nd = nd + 1
    Input #lngFileNum, temp(nd), Dens(nd)
    Loop
    Close lngFileNum
    'Display values on the worksheet
    With ws
    For i = 2 To nd + 2
    .Cells(i, 1).value = temp(i)
    .Cells(i, 2).value = Dens(i)
    Next i
    End With
    Erase Dens
    Erase temp
    'create a file
    'clear Contents
    inc = inc + 1
    End Sub

    Public Sub OutputResults(ByVal path As String, ByVal maxBuilding As Double, _
    ByVal minBuilding As Double, ByVal volume As Double)
    Dim lngFileNum As Long
    lngFileNum = FreeFile
    Open path For Append As lngFileNum
    Write #lngFileNum, maxBuilding, minBuilding, volume, path
    Close lngFileNum
    End Sub

    Public Sub ToggleInterface(ByVal interfaceOn As Boolean)
    With Excel.Application
    .EnableEvents = interfaceOn
    .ScreenUpdating = interfaceOn
    If interfaceOn Then
    .Calculation = xlCalculationAutomatic
    .Cursor = xlDefault
    .EnableCancelKey = xlInterrupt
    Else
    .Calculation = xlCalculationManual
    .Cursor = xlWait
    .EnableCancelKey = xlErrorHandler
    End If
    End With
    End Sub

    Private Function ImportMethod(ByVal path As String) As Excel.Worksheet
    Workbooks.OpenText path, xlWindows
    Set ImportMethod = Excel.Workbooks(Excel.Workbooks.Count).Worksheets(1)
    End Function

    Public Function ByteArrayToString(ByRef vals() As Byte, ByVal stopAtIndex As _
    Long) As String
    Dim lngIndex As Long
    Dim lngUpper As Long
    Dim strResult As String
    lngUpper = UBound(vals)
    strResult = "Index" & vbTab & "Value" & vbTab & "Char Val" & vbLf
    If lngUpper > stopAtIndex Then lngUpper = stopAtIndex
    For lngIndex = 0& To lngUpper Step 2&
    strResult = strResult & (lngIndex & vbTab & vals(lngIndex) & vbLf)
    strResult = strResult & ((lngIndex + 1) & vbTab & vals(lngIndex + 1) & _
    vbTab & GetChar(vals(lngIndex), vals(lngIndex + 1&)) & vbLf)
    Next
    ByteArrayToString = strResult
    End Function

    Private Function GetChar(ByVal byte1 As Byte, ByVal byte2 As Byte) As String
    Dim strRtnVal As String
    strRtnVal = Excel.WorksheetFunction.Clean(ChrW(byte1 + byte2 * 256&))
    If LenB(strRtnVal) = 0& Then strRtnVal = "?"
    GetChar = strRtnVal
    End Function

    Private Function GetFileBytes(ByVal path As String) As Byte()
    Dim lngFileNum As Long
    Dim bytRtnVal() As Byte
    lngFileNum = FreeFile
    Open path For Binary Access Read Lock Write As lngFileNum
    ReDim bytRtnVal(LOF(lngFileNum)) As Byte
    Get lngFileNum, , bytRtnVal
    Close lngFileNum
    GetFileBytes = bytRtnVal
    End Function

    Private Function GetData(ByVal path As String) As Double()
    Const lngIncrement_c As Long = 40000
    Const lngErrSOOR_c As Long = 9& '<Subscript out of range
    Dim lngFileNum As Long
    Dim dblRtnVal() As Double
    Dim dblNum As Double
    Dim lngRecord As Long
    Dim lngUprBnd As Long
    On Error GoTo Err_Hnd
    lngFileNum = FreeFile
    Open path For Input Access Read Lock Write As lngFileNum
    lngUprBnd = lngIncrement_c
    ReDim dblRtnVal(lngUprBnd) As Double
    Do Until EOF(lngFileNum)
    'Value from the first column is discarded in favor of the second:
    Input #lngFileNum, dblRtnVal(lngRecord), dblRtnVal(lngRecord)
    lngRecord = lngRecord + 1&
    Loop
    Close lngFileNum
    ReDim Preserve dblRtnVal(lngRecord + 1&) As Double
    GetData = dblRtnVal
    Exit Function
    Err_Hnd:
    If Err.Number = lngErrSOOR_c Then
    If lngRecord > lngUprBnd Then
    lngUprBnd = lngUprBnd + lngIncrement_c
    ReDim Preserve dblRtnVal(lngUprBnd) As Double
    Resume
    End If
    End If
    Close lngFileNum
    'Reraise error
    Err.Raise Err.Number, Err.Source, Err.Description, Err.HelpFile, _
    Err.HelpContext
    End Function

    Private Function ArrayPercentile(ByRef vals() As Double, ByVal percentile As _
    Double) As Double
    Dim lngIndx As Long
    Dim lngLwrBnd As Long
    Dim lngUprBnd As Long
    lngLwrBnd = LBound(vals)
    lngUprBnd = UBound(vals)
    MergeSort vals, lngLwrBnd, lngUprBnd
    ArrayPercentile = vals(Excel.WorksheetFunction.RoundUp((lngUprBnd - _
    lngLwrBnd + 1) * percentile, 0))
    End Function

    Public Sub MergeSort(list() As Double, ByVal first_index As Long, ByVal _
    last_index As Long)
    'Merge sort code adapted to doubles from code located on www.vb-helper.com.
    'http://www.vb-helper.com/howto_mergesort.html
    Dim middle As Long
    If (last_index > first_index) Then
    ' Recursively sort the two halves of the list.
    middle = (first_index + last_index) \ 2
    MergeSort list, first_index, middle
    MergeSort list, middle + 1, last_index
    ' Merge the results.
    Merge list, first_index, middle, last_index
    End If
    End Sub

    ' Merge two sorted sublists.
    Public Sub Merge(list() As Double, ByVal beginning As Long, ByVal middle As _
    Long, ByVal ending As Long)
    'Merge sort code adapted to doubles from code located on www.vb-helper.com.
    'http://www.vb-helper.com/howto_mergesort.html
    Const lngDataTypeByteLen_c As Long = 8&
    Dim temp_array() As Double
    Dim temp As Integer
    Dim counterA As Long
    Dim counterB As Long
    Dim counterMain As Long
    ' Copy the array into a temporary array.
    ReDim temp_array(beginning To ending)
    CopyMemory temp_array(beginning), list(beginning), (ending - beginning + 1) _
    * lngDataTypeByteLen_c
    ' counterA and counterB mark the next item to save
    ' in the first and second halves of the list.
    counterA = beginning
    counterB = middle + 1
    ' counterMain is the index where we will put the
    ' next item in the merged list.
    counterMain = beginning
    Do While (counterA <= middle) And (counterB <= ending)
    ' Find the smaller of the two items at the front
    ' of the two sublists.
    If (temp_array(counterA) <= temp_array(counterB)) Then
    ' The smaller value is in the first half.
    list(counterMain) = temp_array(counterA)
    counterA = counterA + 1
    Else
    ' The smaller value is in the second half.
    list(counterMain) = temp_array(counterB)
    counterB = counterB + 1
    End If
    counterMain = counterMain + 1
    Loop
    ' Copy any remaining items from the first half.
    If counterA <= middle Then
    CopyMemory list(counterMain), temp_array(counterA), (middle - counterA _
    + 1) * Len(list(beginning))
    End If
    ' Copy any remaining items from the second half.
    If counterB <= ending Then
    CopyMemory list(counterMain), temp_array(counterB), (ending - counterB _
    + 1) * Len(list(beginning))
    End If
    End Sub
    [/vba]
    Cordially,
    Aaron



    Keep Our Board Clean!
    • Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
    • Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.

  6. #6
    ok, I added the zip attachment

    Ah,

    I see. Writing it to the sheet it what's bogging it down. I figured so. Man, you guys are awesome! In the excel file I attached, i cleared out some of the other equations that were just making the file too big to post. They can always be put back in later.

    Yeah, the streaming thing to memory seemed like such an extra step to do the same thing. I never though about turning certain features off.

    Thanks again guys! I appreciate it so much.

  7. #7
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,728
    Location
    Oorang --

    It's easier to pull the data in via a loop using the input method ...

    That said you still have the issue of calculating the Percentile in Memory. This calls for the implementation of a sort algorithm and results a quote a lot of code. However the end result is faster by about half a a second. So if you really want to you can
    Question: Is the Input loop faster that opening the CSV file as a new workbook, using the worksheet in that WB to Sort if necessary, and going right to the WorksheetFunction.Percentile, closing the CSV WB, and getting the next one?

    I couldn't find the zip file to try it out my self


    (See also: "Premature optimization is the root of all evil.")
    Can you post a link or reference, please?

    Paul

  8. #8
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    I timed my code in your workbook and to do 4000 files with 10,000 records in each takes 15 mins - with screen updating still on. Turning it off yielded only a slight improvement to 12 minutes.

    Quote Originally Posted by Paul_Hossler
    I couldn't find the zip file to try it out my self
    It's attached to the first post in this thread.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  9. #9
    I took off the auto calc and changed it to "until WB a save" and it cut down my time to about 4 seconds a cycle.

    p45cal,

    I ran your code and It processed the first file in no time flat, but upon opening the second file, it returned an error 9 message for the "ReDim Preserve TempDens(1 To 2, 1 To nd)" under the second Do Loop. Originally, I had got around this by declaring [vba]Dim Temp(100000) As Double, Dens(100000) As Double[/vba]
    and when I try to do that again, VBA simply tells me those variables have already been declared.

    Any insight?

    aaron

    sorry,

    Subscript out of range (Error 9)

    aaron

  10. #10
    Knowledge Base Approver VBAX Master Oorang's Avatar
    Joined
    Jan 2007
    Posts
    1,135
    Location
    Quote Originally Posted by Paul_Hossler
    Question: Is the Input loop faster that opening the CSV file as a new workbook, using the worksheet in that WB to Sort if necessary, and going right to the WorksheetFunction.Percentile, closing the CSV WB, and getting the next one?
    Yes the input loop tested faster than opening the file.
    Quote Originally Posted by Paul_Hossler
    Can you post a link or reference, please?
    Paul
    Premature optimization is the root of all evil.
    Quote Originally Posted by p45cal
    I timed my code in your workbook and to do 4000 files with 10,000 records in each takes 15 mins - with screen updating still on. Turning it off yielded only a slight improvement to 12 minutes.
    I have questions
    1.) Did you turn off events and calculation? Those are where the large gains were.
    2.) How fast is your computer?
    3.) Are you just processing all files in a particular folder?

    Aaron R. -> Are the files already sorted, because you could shave off even more time there.

    Note: The memory method I laid out took 6 minutes on 4000 10000-record files.
    Cordially,
    Aaron



    Keep Our Board Clean!
    • Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
    • Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.

  11. #11
    Oorang,

    yes, the files are sorted. The files i posted were not only because it's just a test example.

    Thanks!

    aaron

  12. #12
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Quote Originally Posted by aaronreyna
    but upon opening the second file, it returned an error 9 message for the "ReDim Preserve TempDens(1 To 2, 1 To nd)" under the second Do Loop.
    aaronreyna:
    I don't get the error, post your exact code or try with the code posted lower down in this message. (note I've tacked 'xx' to the end of the sub name)

    **************************************************


    Quote Originally Posted by Oorang
    Yes the input loop tested faster than opening the file.
    Premature optimization is the root of all evil.
    I have questions
    1.) Did you turn off events and calculation? Those are where the large gains were.
    2.) How fast is your computer?
    3.) Are you just processing all files in a particular folder?
    Oorang:
    Answers:
    1. No, neither were switched off. Don't forget, there is only one line in my code that changes what's on the sheet, and that line is executed only once per file. viz.
    .Range("A2").Resize(nd, 2) = Application.Transpose(TempDens)
    which puts the whole array in at once. So I didn't see the point in turning it off at all. (Actually, there is another, the one that clears the first two columns.)

    2. I don't know, what sort of info do you need? AMD Athlon 64x2 Dual Core 3800+, 2.00 GHz, 1.96 GB Ram. 3+ years old.

    3. It's processing the same file over and over again. The file has been constructed with the code:[vba]For i = 1 To 10000
    Write #2, Int((500) * Rnd + 1), Int((50000) * Rnd + 1)
    Next i[/vba] which produces the likes of:
    90,14479
    151,38738
    8,38037
    408,35452
    23,20702
    432,39525

    to 10000 rows.
    I checked that the entire array was correctly placed in the sheet- it was.

    I had to make a guess at the formulae in columns H and I since I only have xl2003, and when the 2007 compatibility whatsit converted the file it reported some formulae would have #REF! in because of the number of rows available in 2007 being so much larger than 2003.
    So where I saw the likes of
    =PERCENTILE(#REF!,0.4)
    I put
    =PERCENTILE(A:A,0.4)
    and
    =PERCENTILE(B:B,0.4)
    etc.

    While writing this I noticed, while debugging, some values appearing in column C, so I changed the formulae to:
    =PERCENTILE(B$2:B$65536,0.4)
    and
    =PERCENTILE(C$2:C$65536,0.4).
    Incidentally, while doing this I came across this in the Percentile Help:
    "If array is empty or contains more than 8,191 data points, PERCENTILE returns the #NUM! error value."
    However I never did get an #NUM (unless the array was empty) and I got the same result whether I used:
    =PERCENTILE(B$2:B$65536,0.4)
    or
    =PERCENTILE(B$2:B$10001,0.4)
    (and to double check I even made sure that
    =PERCENTILE(B$2:B$8192,0.4)
    gave a different answer, so they may have fixed this limitation)

    If I'm wrong with my guesses, please tell me.

    So having made these changes, I created 100 different text files, added their names in column Q and ran the code:[vba]Sub run_Clickxx()
    Application.ScreenUpdating = False
    Starttime = Timer
    'Clear Contents
    Range("A2:B20000").ClearContents '"A2:B100000"
    Dim i As Long, nd As Long, MaxBuilding As Long, MinBuilding As Long
    Dim File_name As String, inc As Long, j As Long, volume As Long
    Dim TempDens() As Double
    inc = 1
    'For j = 0 To inc
    Do
    ReDim TempDens(1 To 2, 1 To 1) 'ReDim TempDens(1 To 2, 0 To 0)
    If Sheet1.Cells(inc, 17).Value = "" Then Exit Do
    File_name = Sheet1.Cells(inc, 17).Value
    'fetch data from the file
    Open File_name For Input As #1
    Do
    If EOF(1) Then Exit Do
    nd = nd + 1
    ReDim Preserve TempDens(1 To 2, 1 To nd) 'ReDim Preserve TempDens(1 To 2, 0 To nd)
    ' Input #1, Temp(nd), Dens(nd)
    Input #1, TempDens(1, nd), TempDens(2, nd)
    Loop
    Close #1
    'Display values on the worksheet
    With Sheets("Export_Output1")
    .Range("A2").Resize(nd, 2) = Application.Transpose(TempDens)
    nd = 0
    Erase TempDens
    'create a file
    Open "c:\z_unit_write.txt" For Append As #2
    MaxBuilding = .Range("h12")
    MinBuilding = .Range("k12")
    volume = .Range("m13")
    Write #2, MaxBuilding, MinBuilding, volume, File_name
    Close #2
    'clear Contents
    .Range("A2:B20000").ClearContents 'A2:B100000
    End With
    inc = inc + 1
    Loop
    'Next j
    MsgBox (Timer - Starttime)
    Application.ScreenUpdating = True
    Stop
    End Sub[/vba]which took 30 secs. which means, sadly, that the 4000, 10,000 line files will take 20 mins.
    Last edited by p45cal; 09-11-2009 at 05:12 PM.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  13. #13
    WOW,

    p45cal, I ran your code and it works amazingly! HA HA HA HA HA, you guys are so out of my league it's not even funny. Most importantly I see the errors of my way! Dont write it to sheet if you dont need to.

    thanks again!

    aaron

  14. #14
    Knowledge Base Approver VBAX Master Oorang's Avatar
    Joined
    Jan 2007
    Posts
    1,135
    Location
    Ok, well if the files are already sorted you just need two things. The record count and then the value of the nth file (twice). This method gets us down to less than 2 minutes.
    If you have control over the file format, you might suggest switching to a fixed width file. Then you could calculate record count using the file size and get the percentile using "Seek" to go straight to the record in question.

    BTW if you are wondering why I used the Windows API to iterate the files in the folder, it's because it's way faster than the FSO.
    [vba]Option Explicit

    Private Const MAX_PATH As Long = 255&
    Private Const INVALID_HANDLE_VALUE As Long = -1&

    Private Type FILETIME
    dwLowDateTime As Long
    dwHighDateTime As Long
    End Type

    Private Type WIN32_FIND_DATA
    dwFileAttributes As Long
    ftCreationTime As FILETIME
    ftLastAccessTime As FILETIME
    ftLastWriteTime As FILETIME
    nFileSizeHigh As Long
    nFileSizeLow As Long
    dwReserved0 As Long
    dwReserved1 As Long
    cFileName As String * MAX_PATH
    cAlternate As String * 14
    End Type

    Private Declare Function GetTickCount Lib "Kernel32.dll" () As Long
    Private Declare Function FindFirstFile Lib "kernel32" Alias "FindFirstFileA" _
    (ByVal lpFileName As String, lpFindFileData As WIN32_FIND_DATA) As Long
    Private Declare Function FindNextFile Lib "kernel32" Alias "FindNextFileA" _
    (ByVal hFindFile As Long, lpFindFileData As WIN32_FIND_DATA) As Long
    Private Declare Function GetFileAttributes Lib "kernel32" Alias _
    "GetFileAttributesA" (ByVal lpFileName As String) As Long
    Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As _
    Long


    Public Sub Test()
    Const strFolder_c As String = "C:\Test_CSVs\"
    Const strSearchValue_c As String = "*.csv"
    Dim tFileData As WIN32_FIND_DATA
    Dim tEmpty As WIN32_FIND_DATA
    Dim lngFileHandle As Long
    Dim lngResult As Long
    Dim strFullPath As String
    Dim strFileData() As String
    Dim lngRecords As Long
    Dim strOutputPath As String
    Dim lngStart As Long
    Dim lngEnd As Long
    lngStart = GetTickCount
    strOutputPath = strFolder_c & "z_unit_write.txt"
    lngFileHandle = FindFirstFile(strFolder_c & strSearchValue_c, tFileData)
    Do
    'VB does not have short circuted ifs. This
    If lngResult = INVALID_HANDLE_VALUE Then Exit Do
    strFullPath = strFolder_c & RTrimNulls(tFileData.cFileName)
    If (GetFileAttributes(strFullPath) And vbDirectory) <> vbDirectory Then
    strFileData = Split(CStr(GetFileBytes(strFullPath)), vbNewLine)
    lngRecords = UBound(strFileData)
    If LenB(strFileData(lngRecords)) Then lngRecords = lngRecords + 1&
    OutputResults strOutputPath, Split(strFileData(Fix(lngRecords * _
    0.03)), ",")(1&), Split(strFileData(Fix(lngRecords * 0.97)), _
    ",")(1&), lngRecords, strFullPath
    End If
    'Clear old values to prevent accidental data merging.
    tFileData = tEmpty
    lngResult = FindNextFile(lngFileHandle, tFileData)
    Loop While lngResult
    FindClose lngFileHandle
    lngEnd = GetTickCount
    Debug.Print lngEnd - lngStart, "Milliseconds"
    End Sub

    Private Function GetFileBytes(ByVal path As String) As Byte()
    Dim lngFileNum As Long
    Dim bytRtnVal() As Byte
    lngFileNum = FreeFile
    Open path For Binary Access Read Lock Write As lngFileNum
    ReDim bytRtnVal(LOF(lngFileNum)) As Byte
    Get lngFileNum, , bytRtnVal
    Close lngFileNum
    GetFileBytes = bytRtnVal
    End Function

    Private Function RTrimNulls(ByVal value As String) As String
    RTrimNulls = LeftB$(value, InStrB(value, vbNullChar))
    End Function

    Public Sub OutputResults(ByVal outputPath As String, ByVal maxBuilding As _
    Double, ByVal minBuilding As Double, ByVal volume As Double, ByVal sourcePath _
    As String)
    Dim lngFileNum As Long
    lngFileNum = FreeFile
    Open outputPath For Append As lngFileNum
    Write #lngFileNum, maxBuilding, minBuilding, volume, sourcePath
    Close lngFileNum
    End Sub
    [/vba]
    Cordially,
    Aaron



    Keep Our Board Clean!
    • Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
    • Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.

  15. #15
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,728
    Location
    aaronreyna: (and the others)

    This has been a very interesting learning experience, but I was wondering about the Percentile() calculations. They never seem to get saved for any file and get wiped out when the next file is loaded.

    The Max, Min, count, and file name seem to be all that's actually saved.

    Paul

  16. #16
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Quote Originally Posted by Paul_Hossler
    aaronreyna: (and the others)

    This has been a very interesting learning experience, but I was wondering about the Percentile() calculations. They never seem to get saved for any file and get wiped out when the next file is loaded.

    The Max, Min, count, and file name seem to be all that's actually saved.

    Paul
    The Max and min saved are the 97th and 3rd percentile values. I suspect the other percentiles showing were only used during development to help choose which percentiles would be best to save.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •