Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 25

Thread: FSO filtering on the fly

  1. #1
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location

    FSO filtering on the fly

    I am wondering if it is possible to read a text file using FSO and filter the data on the fly. The scenario is that I have a data file (csv) that has over 900k of lines. I would like to read this data file and filter by date range and put that into Excel

  2. #2
    Moderator VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    FSO? File System Object?

    This give you any ideas?
    Process All CSV Files In SubFolders
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  3. #3
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    Yes, File System Object

    I think it is a start.... However, I wanted to filter the data file by a date range then put that into the worksheet. Instead of loading 900k of lines into Excel, I wanted to filter to a date range, this is more meaningful for trending data and probably more efficient since a lot of unwanted data will be filtered out.

  4. #4
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    Another idea that I have is to run SQL on the data file to achieve the filtering instead of using FSO. I will see what I can hammer out.

  5. #5
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,729
    Location
    1. It's easy enough to just read the text file one line at a time using VBA, seeing if the line 'passes', and if it does add it to the spread sheet (this'd be my preference)

    2. A FSO way


    https://msdn.microsoft.com/en-us/lib...(v=vs.84).aspx



    Const ForReading = 1, ForWriting = 2, ForAppending = 8
    Dim fso, MyFile, FileName, TextLine
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    
    ' Open the file for input.
    Set MyFile = fso.OpenTextFile(FileName, ForReading)
    
    ' Read from the file and display the results.
    Do While MyFile.AtEndOfStream <> True
        TextLine = MyFile.ReadLine
    
        check to see if you want it and put it in the spreadsheet if it passes
    
    
    Loop
    
    MyFile.Close
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  6. #6
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    Paul, thanks but that line by line is way too slow. That is how I am currently doing it, I fell asleep during the load.

  7. #7
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Re SQL; give us such a data file to play with.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  8. #8
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,729
    Location
    Quote Originally Posted by JKwan View Post
    Paul, thanks but that line by line is way too slow. That is how I am currently doing it, I fell asleep during the load.
    I've always found that reading a text file was pretty fast.

    My text CSV is 100K lines with all but the 100,000th one being 'throw aways' but I read it 10 times for a round million reads. The last one that starts with bbbbb is written to the next available row in the activesheet

    Only takes a second or less on my PC from a cold start

    Even if you have to parse it with Split() to get the date field, it still should be faster than falling asleep :-)


    Option Explicit
    Sub TimingTest()
        Dim iFile As Long, iOut As Long, iIn As Long
        Dim sLine As String
        Dim i As Long
        
        Dim tStart As Date, tEnd As Date
        
        
        tStart = Now
        
        iOut = 1
        
        For i = 1 To 10
        
            iFile = FreeFile
            iIn = 0
            Open "c:\users\daddy\desktop\datafile.csv" For Input As #iFile
        
            Do While Not EOF(iFile)
                        
                iIn = iIn + 1
                
                If iIn Mod 1000 = 0 Then DoEvents
                
                Line Input #iFile, sLine
        
                If Left(sLine, 5) = "bbbbb" Then    ' line 100,000
                    ActiveSheet.Cells(iOut, 1) = sLine
                    iOut = iOut + 1
                End If
            Loop
        
            Close #iFile
        Next I
        
        tEnd = Now
        MsgBox tEnd - tStart
     
    End Sub
    Last edited by Paul_Hossler; 02-11-2016 at 01:18 PM.
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  9. #9
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    Hey Paul, thanks for the demo code. However, with my 200 meg file, it is taking forever to load. I am making head ways with the SQL pre-filter.

  10. #10
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,729
    Location
    NP, but like p45cal says .. it'd be nice to see a small sample of the file.

    Is it a CSV or is it in a database that exports it as a CSV where you can use SQL?
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  11. #11
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    Here is a snippet of the data file, it is a CSV file that I grab from a site. So I want to load a range of records based on the DrillingDate.
    Attached Files Attached Files

  12. #12
    Moderator VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    You must pick a book up before you can read it.

    A 200MB file will take a while for the computer to open.

    It takes ~4 seconds for my fast binary file editor to open a 2+ MB file.

    Until you accept that it is going to take several minutes, possibly even a couple of cups of coffee, to do this...

    And is this a one time thin?. You have data here going back to 1955!

    How many times will you need to parse this file? That is a serious question, because that knowledge will affect my recommendations.
    Last edited by SamT; 02-11-2016 at 05:27 PM.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  13. #13
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    This is work in progress (bedtime here).
    The attached has a button in it which asks you to choose a csv file (to test, first select the file you gave us above). It then adds a new sheet and gives you your filtered data.
    At the moment the dates it uses are fixed in the code (from 1978-11-12 to 1984-06-24 (so you should be able to test on any file with drilling dates in that range)), but I need to know how you want to pass/define the dates between which you want results (will they be in cells somewhere, or do you want to be able to set them while the code runs?).
    (The code may be more convoluted than it need be as I haven't done this too often. ODBC may not be the best solution.)
    Attached Files Attached Files
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  14. #14
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    @Sam - like I said earlier, I fall asleep when I load the data file....
    This is why I wanted to pre-filter the data. I know loading the data file will take time, I accepted it.

    I trend the data semi-annually, looking at various year ranges

    @P45 - thanks for looking at it, I will take a look tomorrow.

  15. #15
    Moderator VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    I trend the data semi-annually, looking at various year ranges
    Why not break the data into decade blocks, one file to a decade? That would take little more time than parsing the entire file once. Just a guess, but splitting it into years might only take twice as long as parsing it once , but you would never again have to take more than a few seconds to read in ten or twenty files.

    On the other hand, putting it in a database would take about the same setup time.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  16. #16
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,646
    Sub M_snb()
       sn= filter(split(createobject("scripting.filesystemobject").opentextfile("G:\OF\sample.csv").readall,vbcrlf),"dd-mm-yyyy")
       cells(1).resize(ubound(sn)+1)=application.transpose(sn)
    End sub
    or
    Sub M_snb()
       open "G:\OF\sample.csv") for input as #1
         sn= filter(split(Input(LOF(1), #1),vbcrlf),"dd-mm-yyyy")
       close
       cells(1).resize(ubound(sn)+1)=application.transpose(sn)
    End sub
    If transpose can't handle this (applies to both apporaches):
    Sub M_snb()
       with createobject("scripting.filesystemobject")
         .createtextfile("G:\OF\Filtered.csv").write  join(filter(split(.opentextfile("G:\OF\sample.csv").readall,vbcrlf),"dd-mm-yyyy")),vbcrlf)
       end with
       workbooks.open "G:\OF\Filtered.csv"
    End sub
    or something like (Classical ADO):

    Sub M_snb()
      With CreateObject("ADODB.Recordset")
        .Open "SELECT * FROM `sample.csv` WHERE [DrillingDate] <12/01/1980", "Driver={Microsoft Text Driver (*.txt; *.csv)};" & "Dbq=G:\downloads\", adOpenStatic      '  3
        Cells(1).CopyFromRecordset .DataSource
      End With
    End Sub
    Last edited by snb; 02-12-2016 at 03:16 AM.

  17. #17
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    @Sam - This is only one of my big data file. This is an example of "one of my problems". I have data file from SCADA that produce hourly data from wells on volume productions vs pressure, this is even bigger in size! I used the above example because that data is public. I agree with you that a database is more manageable, however, Excel is my only tool for me, since I am in user land.

    @snb - Thanks for the different ways of filtering, I will definitely look at all of them. I think they are more flexible and very usable ways to pre-filter!

    @p45 - I will look at your example as well, but I will look at snb's first, because they are very promising.

  18. #18
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,729
    Location
    It would seem that instead of having Excel (which runs on a small computer) do the filtering, it'd be faster over all to have a SQL query on the larger DBMS computer to filter out the records that are not needed and only send the small subset to a CSV file for Excel.


    I would think that regardless of the actual technique (having Excel or something else), all 900K 200 MB of records would still need to be 'looked at' on the smaller PC
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  19. #19
    Moderator VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    I have data file from SCADA that produce hourly data from wells on volume productions vs pressure, this is even bigger in size! I used the above example because that data is public.
    Can you run the loading at night so that it is ready for you in the morning. I don't think we're going to be able to shorten the actual loading of the files much, although snb writes the fastest code around.

    Are you dealing with "Raw" SCATA data?
    I ask because it probably has tons of info you don't need and it might be most efficient to have an old ($300) Linux machine do the preliminary filtering. I believe that a simple GREP script would be blazing fast. (Blazing) being a relative term.) In any case it would be parallel work and not tie up your main Box.

    I'll ask a buddy of mine who is into Linux and ask his opinion
    Last edited by SamT; 02-12-2016 at 08:15 AM.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  20. #20
    VBAX Expert
    Joined
    Aug 2004
    Posts
    810
    Location
    Here is an update:
    1 - it seems that the FSO is still "dog" slow, the good, it is faster to prefilter then load. I imaging this is slow because it scans the input to apply filtering across
    2 - ADO - this is by far the fastest and more manageable, I can change the load by altering my SQL, very handy this way. I can really select a true date range (and pretty fast)
    3 - p45's method, works relatively quickly, however, snb's ADO method is much more flexible and faster still.

    Thanks everyone for the help, by the way, this really cuts down my napping :-)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •