Consulting

Results 1 to 7 of 7

Thread: Solved: Align Data

  1. #1
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location

    Solved: Align Data

    Please take a look at the "align Data" attachment. There is no code in the file. it simply has data & Tables to provide a visual of my problem.

    Namely I copy PDF File data into excel and convert it into a Table used for
    calculations at a later stage. Aligning this data can be very user intensive, because the Source PDF Files always have a number of rows or lines that aren't text wrapped properly.

    You can imagine the tedium of looking at say 700 lines manually in order
    locate and then fix those lines that need to be aligned.
    Based on the attached file:

    1) Column "A" data was copied from a pdf (acrobat) file containing
    lines not wrapped properly.

    2) The lines not wrapped properly ( 4 & 5 ) and ( 8 & 9 ) require logic recognizing
    which words need to go into which columns? As shown in Table 2.

    3) The critical Logic is how to ensure that the Section Lines in Column "C" are not
    confused with PartNo Lines in Column "F".

    I spent a huge amount of time on this & now admit I lack the required "Brain Power". Hope someone can help or give me ideas how to solve this using VBA.

    Many thanks..... Brian

  2. #2
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Brian
    Give this a try. Definitley not guaranteed for all situations, but maybe a guide to the way forward. There are a couple of loose ends/unused variables which can be deleted if no use is found for them.
    [vba]
    Option Explicit

    Dim Txt As Long, Num As Long, Mxd As Long, Data As Range

    Sub Parts()
    Dim Arr, Cel As Range
    Dim wdType As Long, Mx As Long
    Application.ScreenUpdating = False
    Set Data = Range(Cells(2, 1), Cells(2, 1).End(xlDown))
    For Each Cel In Data
    Arr = Split(Cel)
    Select Case UBound(Arr)
    Case 0
    'Check for string value
    If typ(Arr(0)) = 2 Then Cel.Offset(, 2) = Cel
    Case 1
    wdType = typ(Arr(0))
    wdType = wdType + typ(Arr(1))
    Select Case wdType
    Case 3
    'Text + Number
    Cel.Offset(, 5) = Arr(0)
    Cel.Offset(, 6) = Arr(1)
    Case 4
    'Text + Text
    Cel.Offset(, 2) = Cel
    Case 5
    'Mixed + Text
    Cel.Offset(, 3) = Arr(0)
    Cel.Offset(, 4) = Arr(1)
    End Select
    Case Else
    Mx = UBound(Arr)
    Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
    If Len(Arr(0)) <= 4 Then
    Cel.Offset(, 3) = Arr(0)
    Cel.Offset(, 6) = Arr(UBound(Arr))
    Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
    Else
    DoEvents
    End If
    End Select
    Next
    Compress
    Application.ScreenUpdating = True
    End Sub

    Function typ(MyStr)
    Dim test As Single, i As Long
    Txt = 0: Num = 0: Mxd = 0
    On Error Resume Next
    test = CSng(MyStr)
    If Not test = Empty Then
    Num = True
    Exit Function
    Else
    For i = 1 To Len(MyStr)
    If Mid(MyStr, i, 1) <= 48 Then
    Num = 1
    Else
    Txt = 2
    End If
    Next
    End If
    typ = Num + Txt
    End Function

    Sub Compress()
    Dim Rng As Range, Rng1 As Range, Rng2 As Range, Cel As Range
    Set Rng = Data.Offset(, 2)
    For Each Cel In Rng
    If Len(Cel) = 0 Then
    Set Rng1 = Cel.Offset(, 0).Resize(, 3)
    Set Rng2 = Cel.Offset(, 3).Resize(, 2)
    If Len(Rng1(2) & Rng1(3)) <> 0 And Len(Rng2(1) & Rng2(2)) = 0 Then
    Rng1.Offset(1).Delete Shift:=xlUp
    Rng2.Delete Shift:=xlUp
    End If
    End If
    Next
    End Sub

    [/vba]
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  3. #3
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location

    Align Data

    Quote Originally Posted by mdmackillop
    Hi Brian
    Give this a try. Definitley not guaranteed for all situations, but maybe a guide to the way forward. There are a couple of loose ends/unused variables which can be deleted if no use is found for them.
    Good Morning Malcolm,

    You have once again "Delivered the Goods".

    • The Section Text (Lines 2 and 8) is not picked up, if it has more than 2 words. I changed Case 0 ( If typ(Arr(0)) <= 2 ) which did not work. It looks as if changing your code to accommodate this would need everything else to be reworked?

    • Would you be able to make a modification such that Case 3 (Text + Number) would also work with any Part Number that doesn't have numbers (Lines 6 and 14) in the attached ?Test? sheet?

    • If you have the time, please talk me through the following array statements:

    __________________________________________________________________

    Case 1
    wdType = typ(Arr(0)) <-------------------------------- Please explain
    wdType = wdType + typ(Arr(1)) <--------------------------------- ? ?
    Select Case wdType <------------------------------- " " ____________________________________________________________________

    Case Else
    Mx = UBound(Arr) <--------------------- Please explain
    Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx)) <--------------------- " "
    'If Len(Arr(0)) <= 4 Then
    If Len(Arr(0)) <= 8 Then
    Cel.Offset(, 3) = Arr(0)
    Cel.Offset(, 6) = Arr(UBound(Arr)) <-------- Please explain
    Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx)) <-------- " "

    ____________________________________________________________________


    My regards,

    Brian

  4. #4
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Brian,
    The problem with this sort of question is that all the data is not available. It's necessary to inspect your data to determine the "rules" that apply to allow it to be identified.
    EG.
    For Section: Maximum number of words; Do any of them contain numbers/punctuation? Any identifiable exceptions.
    Bulletin: Alway mixed numbers and letters. Maximun length = 3? Are there alway 3 terms (Bulletin, Desc., PartNo.)?
    Description: as for section. + Does it ever contain a PartNo on the same line?
    Multiplier: Is it always associated with a Bulletin item?

    Re your queries
    The function Typ checks the passed string and returns 1 for Number, 2 for Text and 3 for Mixed.
    Arr = Split(Cel) Splits the cells contents separated by spaces and puts the results into an array (Arr)
    The words from this array can be referred to by index number Arr(0) to Arr(Ubound(Arr)), the last item.
    Typ(Arr(0)) passes the first word of the cell for "analysis"
    Mx = UBound(Arr) This gives the index number of the last word
    The Case 1 code adds the value of the returned types eg 1+1 or 1+2 etc and according to the "rules" determines whether this is a Section or other data.
    Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx)) Determines the length of the central bit of the cell to create the Desciption as follows:
    Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))

    For 3 word section try
    [VBA]
    Sub Parts()
    Dim Arr, Cel As Range
    Dim wdType As Long, Mx As Long
    Application.ScreenUpdating = False
    Set Data = Range(Cells(2, 1), Cells(2, 1).End(xlDown))
    For Each Cel In Data
    Arr = Split(Cel)
    Select Case UBound(Arr)
    Case 0
    'Check for string value
    If typ(Arr(0)) = 2 Then Cel.Offset(, 2) = Cel
    Case 1
    wdType = typ(Arr(0))
    wdType = wdType + typ(Arr(1))
    Select Case wdType
    Case 3
    'Text + Number
    Cel.Offset(, 5) = Arr(0)
    Cel.Offset(, 6) = Arr(1)
    Case 4
    'Text + Text
    Cel.Offset(, 2) = Cel
    Case 5
    'Mixed + Text
    Cel.Offset(, 3) = Arr(0)
    Cel.Offset(, 4) = Arr(1)
    End Select
    Case 2
    wdType = typ(Arr(0))
    wdType = wdType + typ(Arr(1))
    wdType = wdType + typ(Arr(2))
    Select Case wdType
    Case 6
    'Text + Text +Text
    Cel.Offset(, 2) = Cel
    Case Else
    Mx = UBound(Arr)
    Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
    If Len(Arr(0)) <= 4 Then
    Cel.Offset(, 3) = Arr(0)
    Cel.Offset(, 6) = Arr(UBound(Arr))
    Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
    Else
    DoEvents
    End If
    End Select
    Case Else
    Mx = UBound(Arr)
    Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
    If Len(Arr(0)) <= 4 Then
    Cel.Offset(, 3) = Arr(0)
    Cel.Offset(, 6) = Arr(UBound(Arr))
    Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
    Else
    DoEvents
    End If
    End Select
    Next
    Compress
    Application.ScreenUpdating = True
    End Sub
    [/VBA]
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  5. #5
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location

    Align Data

    Good Evening Malcolm,

    Must apologize for not giving enough thought to what I was asking of you.

    In response yo your questions:

    1) For Section: Maximum number of words; Do any of them contain numbers, punctuation or other identifiable exceptions?
    (a) Would be 1 to 5 words long. With no numbers, no punctuation and no exceptions.

    2) Bulletin: Always mixed numbers and letters. Maximum length = 3?
    (a) No. The Bulletin would be a mix of alpha & alphanumeric 2 to 8 characters long
    (b) e.g. PX, CWS, MCLN, DANSX, ASAHPDF, PLCPLATE
    (c) e.g. C6, AM3, A42, DEC3, A34G, FWDH4, A92H3 (always start with a letter)

    3) Are there always 3 terms? Meaning Bulletin, Description and Part No.?
    (a)Yes

    4) Description: as for section. + Does it ever contain a Part No on the same line?
    (a)No

    5) Multiplier: Is it always associated with a Bulletin item?
    (a) Yes.
    (b) Any Part Number would be a sub-set of the Bulletin Number and have its own multiplier. As “highlighted” in the Bulletin Column in the attachment ("Test" sheet)

    6) The Part Number(s) could be any mix of alpha, alphanumeric and numeric characters:
    (a) The alpha would be 3 to 9 characters long: TCN, HKBT, CCAHK, DLCASTERS

    (b) The alphanumeric would be any mix of characters from 3 to 18 long and always starts with a letter: PG4, AVK33, A10N106, A24208GSC, HDP58MDHCC, CCC60S605521, A42H6012WFSSALP3PT

    (c) The numeric would always be some combination of 10 numerals long and never starting with a zero: 10000000 to 99999999

    In Post #3 I asked “Would you be able to make a modification such that Case 3 (Text + Number) would also work with any Part Number that doesn't have numbers…..” This concerns the “highlighted” data in the PartNo & Multiplier Columns in the attachment ("Test" sheet")

    I hope this explanation is closer to being suitable than the previous one.

    My regards...... Brian
    Last edited by Carpiem; 11-22-2006 at 12:47 AM.

  6. #6
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Brian
    A bit cobbled together, but hopefully you can follow it. I don't think you can ever be 100% with this, just too many possible options.

    [VBA]
    Option Explicit
    Option Compare Text
    Dim Txt As Long, Num As Long, Mxd As Long, Data As Range
    Sub Test()
    Dim Arr, Cel As Range, List, Lst, Fxd As Boolean
    Dim wdType As Long, Mx As Long
    Application.ScreenUpdating = False
    Range("B2:H5000").ClearContents
    Set Data = Range(Cells(2, 1), Cells(2, 1).End(xlDown))
    For Each Cel In Data
    'Check key words
    Fxd = False
    List = Array("Date", "Page", "Quote")
    For Each Lst In List
    If InStr(1, Cel, Lst) > 0 Then
    Cel.Offset(, 1) = Cel
    Fxd = True
    Exit For
    End If
    Next
    If Fxd = True Then GoTo DoNext
    Arr = Split(Cel)
    If PartMult(Arr) Then
    Cel.Offset(, 5) = Arr(0)
    Cel.Offset(, 6) = Arr(1)
    GoTo DoNext
    End If
    If BulPartMult(Arr) Then
    Cel.Offset(, 3) = Arr(0)
    Cel.Offset(, 6) = Arr(UBound(Arr))
    Mx = UBound(Arr)
    Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
    Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
    GoTo DoNext
    End If

    Select Case UBound(Arr)
    Case 0
    'Check for string value
    wdType = SumType(Arr)
    If wdType = 2 Then Cel.Offset(, 2) = Cel
    Case 1
    wdType = SumType(Arr)
    If wdType = 4 Then Cel.Offset(, 2) = Cel
    If wdType <> 4 Then
    Cel.Offset(, 5) = Arr(0)
    Cel.Offset(, 6) = Arr(1)
    End If
    Case 2
    wdType = SumType(Arr)
    If wdType = 6 Then Cel.Offset(, 2) = Cel
    Case 3
    wdType = SumType(Arr)
    If wdType = 8 Then Cel.Offset(, 2) = Cel
    'Text + Number
    If wdType <> 6 Then
    Cel.Offset(, 5) = Arr(0)
    Cel.Offset(, 6) = Arr(1)
    End If
    Case 4
    wdType = SumType(Arr)
    If wdType = 10 Then Cel.Offset(, 2) = Cel
    Case 5
    wdType = SumType(Arr)
    If wdType = 12 Then Cel.Offset(, 2) = Cel
    If wdType <> 12 Then Cel.Offset(, 1) = Cel
    End Select
    DoNext:
    Next
    'Compress
    Set Data = Nothing
    Application.ScreenUpdating = True
    End Sub
    Function SumType(Arr)
    Dim Sums As Long, i As Long
    For i = 0 To UBound(Arr)
    Sums = Sums + typ(Arr(i))
    Next
    SumType = Sums
    End Function
    Function typ(MyStr)
    Dim Test As Single, i As Long
    Txt = 0: Num = 0: Mxd = 0
    For i = 1 To Len(MyStr)
    If Mid(MyStr, i, 1) <= 48 Then
    Num = 1
    Else
    Txt = 2
    End If
    Next
    typ = Num + Txt
    End Function

    Private Sub Compress()
    Dim Rng As Range, Rng1 As Range, Rng2 As Range, Cel As Range
    Set Rng = Data.Offset(, 2)
    For Each Cel In Rng
    If Len(Cel) = 0 Then
    Set Rng1 = Cel.Offset(, 0).Resize(, 3)
    Set Rng2 = Cel.Offset(, 3).Resize(, 2)
    If Len(Rng1(2) & Rng1(3)) <> 0 And Len(Rng2(1) & Rng2(2)) = 0 Then
    Rng1.Offset(1).Delete Shift:=xlUp
    Rng2.Delete Shift:=xlUp
    End If
    End If
    Next
    End Sub
    'Test: 2 words - Alphanumeric in caps + number
    Function PartMult(Arr)
    Dim i As Long, Char As String, Test As Long
    PartMult = False
    If UBound(Arr) <> 1 Then
    Exit Function
    Else
    If IsNumeric(Arr(UBound(Arr))) Then
    For i = 1 To Len(Arr(0))
    Char = Mid(Arr(0), i, 1)
    If Asc(Char) >= 65 And Asc(Char) <= 90 Or IsNumeric(Char) Then
    Test = Test + 1
    End If
    Next
    End If
    End If
    If Len(Test) = Test Then PartMult = True
    End Function
    'Test: >2 words - Alphanumeric in caps + anything + number
    Function BulPartMult(Arr)
    Dim i As Long, Char As String, Test As Long
    BulPartMult = False
    If UBound(Arr) < 2 Then
    Exit Function
    Else
    If IsNumeric(Arr(UBound(Arr))) Then
    For i = 1 To Len(Arr(0))
    Char = Mid(Arr(0), i, 1)
    If Asc(Char) >= 65 And Asc(Char) <= 90 Or IsNumeric(Char) Then
    Test = Test + 1
    End If
    Next
    End If
    End If
    If Len(Arr(0)) = Test Then BulPartMult = True
    End Function
    [/VBA]
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  7. #7
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Good Evening Malcolm,

    Just got back from Montreal. Always a great place to visit.

    Your code is fantastic. I ran it against 20+ files and it never failed, with all the variations in said files.

    Marvelous ... Thank you ... Thank you.

    My regards... Brian

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •