PDA

View Full Version : Solved: Align Data



Carpiem
11-18-2006, 11:50 PM
Please take a look at the "align Data" attachment. There is no code in the file. it simply has data & Tables to provide a visual of my problem.

Namely I copy PDF File data into excel and convert it into a Table used for
calculations at a later stage. Aligning this data can be very user intensive, because the Source PDF Files always have a number of rows or lines that aren't text wrapped properly.

You can imagine the tedium of looking at say 700 lines manually in order
locate and then fix those lines that need to be aligned.
Based on the attached file:

1) Column "A" data was copied from a pdf (acrobat) file containing
lines not wrapped properly.

2) The lines not wrapped properly ( 4 & 5 ) and ( 8 & 9 ) require logic recognizing
which words need to go into which columns? As shown in Table 2.

3) The critical Logic is how to ensure that the Section Lines in Column "C" are not
confused with PartNo Lines in Column "F".

I spent a huge amount of time on this & now admit I lack the required "Brain Power". Hope someone can help or give me ideas how to solve this using VBA.

Many thanks..... Brian

mdmackillop
11-19-2006, 05:07 AM
Hi Brian
Give this a try. Definitley not guaranteed for all situations, but maybe a guide to the way forward. There are a couple of loose ends/unused variables which can be deleted if no use is found for them.

Option Explicit

Dim Txt As Long, Num As Long, Mxd As Long, Data As Range

Sub Parts()
Dim Arr, Cel As Range
Dim wdType As Long, Mx As Long
Application.ScreenUpdating = False
Set Data = Range(Cells(2, 1), Cells(2, 1).End(xlDown))
For Each Cel In Data
Arr = Split(Cel)
Select Case UBound(Arr)
Case 0
'Check for string value
If typ(Arr(0)) = 2 Then Cel.Offset(, 2) = Cel
Case 1
wdType = typ(Arr(0))
wdType = wdType + typ(Arr(1))
Select Case wdType
Case 3
'Text + Number
Cel.Offset(, 5) = Arr(0)
Cel.Offset(, 6) = Arr(1)
Case 4
'Text + Text
Cel.Offset(, 2) = Cel
Case 5
'Mixed + Text
Cel.Offset(, 3) = Arr(0)
Cel.Offset(, 4) = Arr(1)
End Select
Case Else
Mx = UBound(Arr)
Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
If Len(Arr(0)) <= 4 Then
Cel.Offset(, 3) = Arr(0)
Cel.Offset(, 6) = Arr(UBound(Arr))
Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
Else
DoEvents
End If
End Select
Next
Compress
Application.ScreenUpdating = True
End Sub

Function typ(MyStr)
Dim test As Single, i As Long
Txt = 0: Num = 0: Mxd = 0
On Error Resume Next
test = CSng(MyStr)
If Not test = Empty Then
Num = True
Exit Function
Else
For i = 1 To Len(MyStr)
If Mid(MyStr, i, 1) <= 48 Then
Num = 1
Else
Txt = 2
End If
Next
End If
typ = Num + Txt
End Function

Sub Compress()
Dim Rng As Range, Rng1 As Range, Rng2 As Range, Cel As Range
Set Rng = Data.Offset(, 2)
For Each Cel In Rng
If Len(Cel) = 0 Then
Set Rng1 = Cel.Offset(, 0).Resize(, 3)
Set Rng2 = Cel.Offset(, 3).Resize(, 2)
If Len(Rng1(2) & Rng1(3)) <> 0 And Len(Rng2(1) & Rng2(2)) = 0 Then
Rng1.Offset(1).Delete Shift:=xlUp
Rng2.Delete Shift:=xlUp
End If
End If
Next
End Sub

Carpiem
11-20-2006, 06:59 AM
Hi Brian
Give this a try. Definitley not guaranteed for all situations, but maybe a guide to the way forward. There are a couple of loose ends/unused variables which can be deleted if no use is found for them.


Good Morning Malcolm,

You have once again "Delivered the Goods".


The Section Text (Lines 2 and 8) is not picked up, if it has more than 2 words. I changed Case 0 ( If typ(Arr(0)) <= 2 ) which did not work. It looks as if changing your code to accommodate this would need everything else to be reworked?

Would you be able to make a modification such that Case 3 (Text + Number) would also work with any Part Number that doesn't have numbers (Lines 6 and 14) in the attached ?Test? sheet?

If you have the time, please talk me through the following array statements:
__________________________________________________________________

Case 1
wdType = typ(Arr(0)) <-------------------------------- Please explain
wdType = wdType + typ(Arr(1)) <--------------------------------- ? ?
Select Case wdType <------------------------------- " " ____________________________________________________________________

Case Else
Mx = UBound(Arr) <--------------------- Please explain
Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx)) <--------------------- " "
'If Len(Arr(0)) <= 4 Then
If Len(Arr(0)) <= 8 Then
Cel.Offset(, 3) = Arr(0)
Cel.Offset(, 6) = Arr(UBound(Arr)) <-------- Please explain
Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx)) <-------- " "

____________________________________________________________________


My regards,

Brian

mdmackillop
11-20-2006, 12:13 PM
Hi Brian,
The problem with this sort of question is that all the data is not available. It's necessary to inspect your data to determine the "rules" that apply to allow it to be identified.
EG.
For Section: Maximum number of words; Do any of them contain numbers/punctuation? Any identifiable exceptions.
Bulletin: Alway mixed numbers and letters. Maximun length = 3? Are there alway 3 terms (Bulletin, Desc., PartNo.)?
Description: as for section. + Does it ever contain a PartNo on the same line?
Multiplier: Is it always associated with a Bulletin item?

Re your queries
The function Typ checks the passed string and returns 1 for Number, 2 for Text and 3 for Mixed.
Arr = Split(Cel) Splits the cells contents separated by spaces and puts the results into an array (Arr)
The words from this array can be referred to by index number Arr(0) to Arr(Ubound(Arr)), the last item.
Typ(Arr(0)) passes the first word of the cell for "analysis"
Mx = UBound(Arr) This gives the index number of the last word
The Case 1 code adds the value of the returned types eg 1+1 or 1+2 etc and according to the "rules" determines whether this is a Section or other data.
Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx)) Determines the length of the central bit of the cell to create the Desciption as follows:
Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))

For 3 word section try

Sub Parts()
Dim Arr, Cel As Range
Dim wdType As Long, Mx As Long
Application.ScreenUpdating = False
Set Data = Range(Cells(2, 1), Cells(2, 1).End(xlDown))
For Each Cel In Data
Arr = Split(Cel)
Select Case UBound(Arr)
Case 0
'Check for string value
If typ(Arr(0)) = 2 Then Cel.Offset(, 2) = Cel
Case 1
wdType = typ(Arr(0))
wdType = wdType + typ(Arr(1))
Select Case wdType
Case 3
'Text + Number
Cel.Offset(, 5) = Arr(0)
Cel.Offset(, 6) = Arr(1)
Case 4
'Text + Text
Cel.Offset(, 2) = Cel
Case 5
'Mixed + Text
Cel.Offset(, 3) = Arr(0)
Cel.Offset(, 4) = Arr(1)
End Select
Case 2
wdType = typ(Arr(0))
wdType = wdType + typ(Arr(1))
wdType = wdType + typ(Arr(2))
Select Case wdType
Case 6
'Text + Text +Text
Cel.Offset(, 2) = Cel
Case Else
Mx = UBound(Arr)
Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
If Len(Arr(0)) <= 4 Then
Cel.Offset(, 3) = Arr(0)
Cel.Offset(, 6) = Arr(UBound(Arr))
Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
Else
DoEvents
End If
End Select
Case Else
Mx = UBound(Arr)
Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
If Len(Arr(0)) <= 4 Then
Cel.Offset(, 3) = Arr(0)
Cel.Offset(, 6) = Arr(UBound(Arr))
Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
Else
DoEvents
End If
End Select
Next
Compress
Application.ScreenUpdating = True
End Sub

Carpiem
11-22-2006, 12:19 AM
Good Evening Malcolm,

Must apologize for not giving enough thought to what I was asking of you. :banghead:

In response yo your questions:

1) For Section: Maximum number of words; Do any of them contain numbers, punctuation or other identifiable exceptions?
(a) Would be 1 to 5 words long. With no numbers, no punctuation and no exceptions.

2) Bulletin: Always mixed numbers and letters. Maximum length = 3?
(a) No. The Bulletin would be a mix of alpha & alphanumeric 2 to 8 characters long
(b) e.g. PX, CWS, MCLN, DANSX, ASAHPDF, PLCPLATE
(c) e.g. C6, AM3, A42, DEC3, A34G, FWDH4, A92H3 (always start with a letter)

3) Are there always 3 terms? Meaning Bulletin, Description and Part No.?
(a) Yes

4) Description: as for section. + Does it ever contain a Part No on the same line?
(a) No

5) Multiplier: Is it always associated with a Bulletin item?
(a) Yes.
(b) Any Part Number would be a sub-set of the Bulletin Number and have its own multiplier. As “highlighted” in the Bulletin Column in the attachment ("Test" sheet)

6) The Part Number(s) could be any mix of alpha, alphanumeric and numeric characters:
(a) The alpha would be 3 to 9 characters long: TCN, HKBT, CCAHK, DLCASTERS

(b) The alphanumeric would be any mix of characters from 3 to 18 long and always starts with a letter: PG4, AVK33, A10N106, A24208GSC, HDP58MDHCC, CCC60S605521, A42H6012WFSSALP3PT

(c) The numeric would always be some combination of 10 numerals long and never starting with a zero: 10000000 to 99999999

In Post #3 I asked “Would you be able to make a modification such that Case 3 (Text + Number) would also work with any Part Number that doesn't have numbers…..” This concerns the “highlighted” data in the PartNo & Multiplier Columns in the attachment ("Test" sheet")

I hope this explanation is closer to being suitable than the previous one. :think:

My regards...... Brian

mdmackillop
11-23-2006, 04:04 PM
Hi Brian
A bit cobbled together, but hopefully you can follow it. I don't think you can ever be 100% with this, just too many possible options.


Option Explicit
Option Compare Text
Dim Txt As Long, Num As Long, Mxd As Long, Data As Range
Sub Test()
Dim Arr, Cel As Range, List, Lst, Fxd As Boolean
Dim wdType As Long, Mx As Long
Application.ScreenUpdating = False
Range("B2:H5000").ClearContents
Set Data = Range(Cells(2, 1), Cells(2, 1).End(xlDown))
For Each Cel In Data
'Check key words
Fxd = False
List = Array("Date", "Page", "Quote")
For Each Lst In List
If InStr(1, Cel, Lst) > 0 Then
Cel.Offset(, 1) = Cel
Fxd = True
Exit For
End If
Next
If Fxd = True Then GoTo DoNext
Arr = Split(Cel)
If PartMult(Arr) Then
Cel.Offset(, 5) = Arr(0)
Cel.Offset(, 6) = Arr(1)
GoTo DoNext
End If
If BulPartMult(Arr) Then
Cel.Offset(, 3) = Arr(0)
Cel.Offset(, 6) = Arr(UBound(Arr))
Mx = UBound(Arr)
Mx = Len(Cel) - Len(Arr(0)) - Len(Arr(Mx))
Cel.Offset(, 4) = Trim(Mid(Cel, Len(Arr(0)) + 1, Mx))
GoTo DoNext
End If

Select Case UBound(Arr)
Case 0
'Check for string value
wdType = SumType(Arr)
If wdType = 2 Then Cel.Offset(, 2) = Cel
Case 1
wdType = SumType(Arr)
If wdType = 4 Then Cel.Offset(, 2) = Cel
If wdType <> 4 Then
Cel.Offset(, 5) = Arr(0)
Cel.Offset(, 6) = Arr(1)
End If
Case 2
wdType = SumType(Arr)
If wdType = 6 Then Cel.Offset(, 2) = Cel
Case 3
wdType = SumType(Arr)
If wdType = 8 Then Cel.Offset(, 2) = Cel
'Text + Number
If wdType <> 6 Then
Cel.Offset(, 5) = Arr(0)
Cel.Offset(, 6) = Arr(1)
End If
Case 4
wdType = SumType(Arr)
If wdType = 10 Then Cel.Offset(, 2) = Cel
Case 5
wdType = SumType(Arr)
If wdType = 12 Then Cel.Offset(, 2) = Cel
If wdType <> 12 Then Cel.Offset(, 1) = Cel
End Select
DoNext:
Next
'Compress
Set Data = Nothing
Application.ScreenUpdating = True
End Sub
Function SumType(Arr)
Dim Sums As Long, i As Long
For i = 0 To UBound(Arr)
Sums = Sums + typ(Arr(i))
Next
SumType = Sums
End Function
Function typ(MyStr)
Dim Test As Single, i As Long
Txt = 0: Num = 0: Mxd = 0
For i = 1 To Len(MyStr)
If Mid(MyStr, i, 1) <= 48 Then
Num = 1
Else
Txt = 2
End If
Next
typ = Num + Txt
End Function

Private Sub Compress()
Dim Rng As Range, Rng1 As Range, Rng2 As Range, Cel As Range
Set Rng = Data.Offset(, 2)
For Each Cel In Rng
If Len(Cel) = 0 Then
Set Rng1 = Cel.Offset(, 0).Resize(, 3)
Set Rng2 = Cel.Offset(, 3).Resize(, 2)
If Len(Rng1(2) & Rng1(3)) <> 0 And Len(Rng2(1) & Rng2(2)) = 0 Then
Rng1.Offset(1).Delete Shift:=xlUp
Rng2.Delete Shift:=xlUp
End If
End If
Next
End Sub
'Test: 2 words - Alphanumeric in caps + number
Function PartMult(Arr)
Dim i As Long, Char As String, Test As Long
PartMult = False
If UBound(Arr) <> 1 Then
Exit Function
Else
If IsNumeric(Arr(UBound(Arr))) Then
For i = 1 To Len(Arr(0))
Char = Mid(Arr(0), i, 1)
If Asc(Char) >= 65 And Asc(Char) <= 90 Or IsNumeric(Char) Then
Test = Test + 1
End If
Next
End If
End If
If Len(Test) = Test Then PartMult = True
End Function
'Test: >2 words - Alphanumeric in caps + anything + number
Function BulPartMult(Arr)
Dim i As Long, Char As String, Test As Long
BulPartMult = False
If UBound(Arr) < 2 Then
Exit Function
Else
If IsNumeric(Arr(UBound(Arr))) Then
For i = 1 To Len(Arr(0))
Char = Mid(Arr(0), i, 1)
If Asc(Char) >= 65 And Asc(Char) <= 90 Or IsNumeric(Char) Then
Test = Test + 1
End If
Next
End If
End If
If Len(Arr(0)) = Test Then BulPartMult = True
End Function

Carpiem
11-27-2006, 09:49 PM
Good Evening Malcolm,

Just got back from Montreal. Always a great place to visit.

Your code is fantastic. I ran it against 20+ files and it never failed, with all the variations in said files.

Marvelous ... Thank you ... Thank you. :bow: :friends: :thumb

My regards... Brian