Consulting

Results 1 to 10 of 10

Thread: macro to extract specific data from text file to excel sheet

  1. #1
    VBAX Regular
    Joined
    May 2017
    Posts
    10
    Location

    macro to extract specific data from text file to excel sheet

    I have a pdf file which has been converted to txt .

    My issue is to extract specific information as per attached which are numbered from 25 to 42. It consists to extract numbered headers with corresponding data .

    I don't know if this achievable with vba code

    note the text file was unable to load , rename as csv , thus rename as txt before opening
    Attached Files Attached Files

  2. #2
    VBAX Mentor Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    355
    Location
    Hello learn72,

    I do not see any headers that match in these two files. Did you post the correct files?
    Sincerely,
    Leith Ross

  3. #3
    Knowledge Base Approver VBAX Guru
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    4,300
    This is a horrible task! Surely you can get the data another way?
    In the attached there is a flaky solution; loads of assumptions are made, a few guesses too. The likelihood that it will go wrong and/or give wrong results is high.
    I'm not even sure it's giving you what you need.
    Anyway, I've only done some very cursory checking - I leave that to you.
    On Sheet2 of the attached is a button in the vicinity of cell C8 for you to click. Initially, select the same .txt file you renamed .csv and attached here, in order to duplicate what I was working with and hopefully it won't fall over. It places the results starting at cell A10 of the same sheet.
    Sheet5 of the attached is the output I got using the text file you attached.

    The code's in the file but here it is too:
    Sub blah()
      Set Destn = Sheets("Sheet2").Range("A10")
      fpath = Application.GetOpenFilename("Text Files (*.txt),*.txt")
      If Not fpath = False Then
        zzz = CreateObject("scripting.filesystemobject").opentextfile(fpath).readall
        'zzz = CreateObject("scripting.filesystemobject").opentextfile("C:\Users\mmmmmm\AppData\Local\Temp\data.txt").readall
        yyy = Split(zzz, vbCrLf & vbCrLf & "25")
        For Each rcd In yyy
          If InStr(rcd, "Marks & Numbers") > 0 Then
            'Stop
            *** = Split(rcd, vbCrLf & vbCrLf)
            For i = LBound(***) To UBound(***)
              Select Case True
                Case InStr(***(i), "Number and Kind") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  B = www(0)
                Case InStr(***(i), "34  FOB Ncy") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  L = www(1)
                  M = Application.Trim(Split(www(2), "|")(0))
                  N = Application.Trim(Split(www(3), "|")(0))
                Case InStr(***(i), "37  Other Costs") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  O = Application.Trim(Split(www(0), "|")(0))
                  On Error Resume Next
                  P = www(2)
                  Q = www(3)
                  On Error GoTo 0
                Case InStr(***(i), "Description of goods") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  C = www(0)
                Case InStr(***(i), "31  Gross mass") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  ColmI = www(0)
                  J = www(1)
                  K = Application.Trim(Split(www(2), "|")(0))
                Case InStr(***(i), "28  Cty. Org") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  F = www(1)
                  G = www(2)
                  H = www(3)
                Case InStr(***(i), "Marks & Numbers") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  D = www(1)
                  ' Debug.Assert D <> "098"
                  E = www(2)
                Case InStr(***(i), "40  Tax") > 0
                  www = ShortArray(Split(***(i + 1), "  "))
                  X = www(UBound(www))
                  'Stop
                  WriteToSheet Destn, Array(A, B, C, D, E, F, G, H, ColmI, J, K, L, M, N, O, P, Q, , , , , , , X)
                  ii = i + 1
                  Do
                    R = Empty: S = Empty: T = Empty: U = Empty: V = Empty: W = Empty:
                    ccc = ShortArray(Split(***(ii), "  "))
                    If UBound(ccc) > 2 Then
                      'Stop
                      R = ccc(0)
                      S = ccc(1)
                      T = ccc(2)
                      U = ccc(3)
                      V = ccc(4)
                      W = ccc(5)
                      WriteToSheet Destn.Offset(, 17), Array(R, S, T, U, V, W)
                      Set Destn = Destn.Offset(1)
                    End If
                    ii = ii + 1
                  Loop Until InStr(***(ii), "Total") > 0 Or UBound(ccc) < 3
              End Select
              'www = Split(thing, "  ")
            Next i
          End If
        Next rcd
      End If
    End Sub
    
    Function ShortArray(myArr)
    ReDim NewArr(LBound(myArr) To UBound(myArr))
    J = 0
    For i = LBound(myArr) To UBound(myArr)
      Z = Application.Trim(myArr(i))
      If Z <> "" Then
        NewArr(J) = Z
        J = J + 1
      End If
    Next i
    ReDim Preserve NewArr(LBound(myArr) To J - 1)
    ShortArray = NewArr
    End Function
    
    Sub WriteToSheet(Dest, myArr)
    Dest.Resize(, UBound(myArr) + 1).Value = myArr
    End Sub
    I note there are instances of *** in the code above, that's where I had three x's!

    There are things to do still; eg. I haven't cleared the variables for each loop iteration (the variable names are the column letters of their destination, except for ColmI because I used i for something else) and this could lead to wrong data/results.
    Actually, after posting I found it does produce wrong data in colums P and Q starting at item no. 99. I'll post an update tomorrow - too tired now.
    Attached Files Attached Files
    Last edited by p45cal; 02-11-2018 at 08:37 PM.
    p45cal - - - - -This is my signature, it appears after all my posts. Below is not directed at anyone in particular - so don't take offence! - (You might guess why it's there though)
    If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  4. #4
    Knowledge Base Approver VBAX Guru
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    4,300
    Quote Originally Posted by p45cal View Post
    Actually, after posting I found it does produce wrong data in colums P and Q starting at item no. 99. I'll post an update tomorrow - too tired now.
    Update attached.
    Attached Files Attached Files
    p45cal - - - - -This is my signature, it appears after all my posts. Below is not directed at anyone in particular - so don't take offence! - (You might guess why it's there though)
    If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  5. #5
    VBAX Regular
    Joined
    May 2017
    Posts
    10
    Location
    Thank you a lot p45cal for the time dedicated for this,it is not an easy task but very tough , I never thought it would be possible. The original file is a pdf file .

    I will make some test ,if any issues I will revert back

    Otherwise can you suggest is there is another technique to extract data from a pdf file .

  6. #6
    Knowledge Base Approver VBAX Guru
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    4,300
    Quote Originally Posted by learn72 View Post
    Otherwise can you suggest is there is another technique to extract data from a pdf file .
    Where has the pdf file come from? How has it been made?
    p45cal - - - - -This is my signature, it appears after all my posts. Below is not directed at anyone in particular - so don't take offence! - (You might guess why it's there though)
    If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  7. #7
    VBAX Regular
    Joined
    May 2017
    Posts
    10
    Location
    I obtained the pdf file from a system but can't do much more .

    Sent message , Hope it helps

  8. #8
    Knowledge Base Approver VBAX Guru
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    4,300
    A bit surprised there is the facility to produce a pdf file as that's quite a complex thing.
    Is there really no other output format to choose from?
    What is the "system"?
    p45cal - - - - -This is my signature, it appears after all my posts. Below is not directed at anyone in particular - so don't take offence! - (You might guess why it's there though)
    If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  9. #9
    VBAX Regular
    Joined
    May 2017
    Posts
    10
    Location
    Apologize for late reply, the system was developed in Singapore an EDI System, but so far I am trying to find a solution as it is very limited to extract a complete report from it .

  10. #10
    Knowledge Base Approver VBAX Guru
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    4,300
    Quote Originally Posted by learn72 View Post
    the system was developed in Singapore an EDI System
    This is looking hopeful; an EDI document has structure to it, which will be consistent throughout the document.
    I don't know what type your EDI system is but almost anything that it produces is likely to be far better structured than a PDF file.
    An EDI system is designed to produce files for computers to read.
    There are several responses to very similar questions to yours on the internet that we might bring to bear. We just need some more specifics from you. There's even one showing how to get stuff into a csv file, which is good for Excel.

    There are several options and protocols; some will be easier to get into Excel than others. Have a look at this page: https://en.wikipedia.org/wiki/Electr...ta_interchange
    about three-quarters of the way down (under See also). Do you recognise any that are in your system?

    Almost anything would be bettter than a pdf file.

    An interesting youtube video here: https://www.youtube.com/watch?v=oWkzeP2LP-8
    p45cal - - - - -This is my signature, it appears after all my posts. Below is not directed at anyone in particular - so don't take offence! - (You might guess why it's there though)
    If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •