PDA

View Full Version : Assort information in columns from rows



hunsnowboard
07-30-2010, 08:33 AM
Hi there Everyone!

I have the following problem, please try to help me if you can. I get an excel file from a query where the fields are shown in the following format
<mezo="fieldname">parameter</mezo> , so in case of name you can see this:
<mezo="name">John Terry</mezo>.
There are many types of fieldnames (birthdate, city, mother name and so on) which belong to a name, however not every name has all fieldnames (like some just have the city, others have only mother name).
What I would like to do is to have all the filelds and its information put in a column. If you open the attached excel sheet you will see clearly what I mean. The problem is that the field names are not in english, and where you see <free text> it means that there is some text which is not relevant and can be deleted.
The new section always starts with the name field name which originally is
<mezo="név">Kovács Pál</mezo>. Please open the excel sheet attached and you will see clearly my problem.
Thank you in advance for your help and wish you a nice weekend!

GTO
07-30-2010, 08:50 AM
Is there always at least one <free text> that we can count on as a divider between records?

hunsnowboard
07-30-2010, 09:43 AM
Not always. The divider in each case should be the <mezo="név"> section. This should mean that a new record should be added.

GTO
07-30-2010, 03:57 PM
In a Standard Module, try:


Option Explicit

Enum ValType
SubjectName = 1
CityName = 2
BirthYear = 3
BirthPlace = 4
MotherName = 5
End Enum

' assort information in columns from rows
Sub exa()
Dim _
wksMunka2 As Worksheet, _
wksMunka3 As Worksheet, _
rngData As Range, _
rngFound As Range, _
aryRowMarker As Variant, _
aryRaw As Variant, _
aryOutput As Variant, _
RawVal As Variant, _
lFirstRow As Long, _
i As Long

'// Change to suit //
Const TOP_ROW_DATA As Long = 4
Const INPUT_COL As Long = 2

'// Set a reference to source and destination sheets. //
Set wksMunka2 = ThisWorkbook.Worksheets("Munka2")
Set wksMunka3 = ThisWorkbook.Worksheets("Munka3")

With wksMunka2

'// Set a reference to the range that has data, starting at the cell we //
'// prescribed, down to the last cell in the column that has data. If you have //
'// garbage that gets imported, you may use .Find instead to find last row with //
'// valid data. //
Set rngData = .Range(.Cells(TOP_ROW_DATA, INPUT_COL), _
.Cells(.Rows.Count, INPUT_COL).End(xlUp))

With rngData
'// See the function. Basically - find uppermost cell with FindWhat in it //
Set rngFound = RangeFound(SearchRange:=rngData, _
FindWhat:="<mezo=""name"">", _
StartingAfter:=rngData.Cells(rngData.Cells.Count), _
SearchUpDn:=xlNext)

'// If rngFound is Nothing, then we didn't find it - skip to Else and exit //
'// sub //
If Not rngFound Is Nothing Then

'// Save where we found the first match, so we know when we've searched //
'// the entire range and should stop looking. //
lFirstRow = rngFound.Row
'// Initially size our array of row numbers to hold the row number of //
'// the first match, and fill the element. //
ReDim aryRowMarker(1 To 1)
aryRowMarker(1) = lFirstRow

'// Loop using .FindNext, increasing the size of the array along the way//
'// and filling it with the row numbers until we find the first match //
'// again. //
Do
Set rngFound = .FindNext(After:=rngFound)
If Not rngFound.Row = lFirstRow Then
ReDim Preserve aryRowMarker(1 To UBound(aryRowMarker, 1) + 1)
aryRowMarker(UBound(aryRowMarker)) = rngFound.Row
End If
Loop While Not rngFound.Row = lFirstRow

'// Add one more element, this being the last row with data. Note that //
'// we 'push' it one row, as we'll drop a row from here on. //
ReDim Preserve aryRowMarker(1 To UBound(aryRowMarker) + 1)
aryRowMarker(UBound(aryRowMarker)) = rngData.Cells(rngData.Cells.Count).Row + 1

'// Size our output array based on how many row markers we saved for the//
'// rows, and of course 5 columns. Then fill the upper row with a //
'// header row. //
ReDim aryOutput(0 To UBound(aryRowMarker) - 1, 1 To 5)
aryOutput(0, 1) = "Name"
aryOutput(0, 2) = "City"
aryOutput(0, 3) = "Birth Year"
aryOutput(0, 4) = "POB"
aryOutput(0, 5) = "Mother Name"
'// Now we just run from the second row of the output array to the end //
For i = 1 To UBound(aryRowMarker) - 1

'// Using the row numbers saved, fill a dynamic array with the values//
'// in the given range. //
aryRaw = .Parent.Range(.Parent.Cells(aryRowMarker(i), INPUT_COL), _
.Parent.Cells(aryRowMarker(i + 1) - 1, INPUT_COL) _
).Value

'// For ea val in our 'chunk'... //
For Each RawVal In aryRaw

'// Return a val based on the first success of InStr, or, //
'// return Null. //
Select Case Switch(InStr(1, RawVal, "<mezo=""name"">") > 0, 1, _
InStr(1, RawVal, "<mezo=""city"">") > 0, 2, _
InStr(1, RawVal, "<mezo=""birth date"">") > 0, 3, _
InStr(1, RawVal, "<mezo=""birth place"">") > 0, 4, _
InStr(1, RawVal, "<mezo=""mother name"">") > 0, 5)
Case ValType.SubjectName
'// Trim up the found match and assign to correct element //
aryOutput(i, 1) = _
Replace( _
Replace(RawVal, "<mezo=""name"">", vbNullString), _
"</mezo>", vbNullString _
)
Case ValType.CityName
aryOutput(i, 2) = _
Replace( _
Replace(RawVal, "<mezo=""city"">", vbNullString), _
"</mezo>", vbNullString _
)
Case ValType.BirthYear
aryOutput(i, 3) = _
Replace( _
Replace(RawVal, "<mezo=""birth date"">", vbNullString), _
"</mezo>", vbNullString _
)
Case ValType.BirthPlace
aryOutput(i, 4) = _
Replace( _
Replace(RawVal, "<mezo=""birth place"">", vbNullString), _
"</mezo>", vbNullString _
)
Case ValType.MotherName
aryOutput(i, 5) = _
Replace( _
Replace(RawVal, "<mezo=""mother name"">", vbNullString), _
"</mezo>", vbNullString _
)
Case Else
'do nothing, garbage row
End Select
Next
Next
Else
Exit Sub
End If
End With
End With

'// plunk the output array where you want //
With wksMunka3.Range("A2").Resize(UBound(aryOutput, 1) + 1, 5)
.Value = aryOutput
.Rows(1).Font.Bold = True
.EntireColumn.AutoFit
End With
End Sub

Function RangeFound(SearchRange As Range, _
Optional FindWhat As String = "*", _
Optional StartingAfter As Range, _
Optional LookAtTextOrFormula As XlFindLookIn = xlValues, _
Optional LookAtWholeOrPart As XlLookAt = xlPart, _
Optional SearchRowCol As XlSearchOrder = xlByRows, _
Optional SearchUpDn As XlSearchDirection = xlPrevious, _
Optional bMatchCase As Boolean = False) As Range

If StartingAfter Is Nothing Then
Set StartingAfter = SearchRange(1)
End If

Set RangeFound = SearchRange.Find(What:=FindWhat, _
After:=StartingAfter, _
LookIn:=LookAtTextOrFormula, _
LookAt:=LookAtWholeOrPart, _
SearchOrder:=SearchRowCol, _
SearchDirection:=SearchUpDn, _
MatchCase:=bMatchCase)
End Function

Hope that helps,

Mark

hunsnowboard
07-30-2010, 11:04 PM
Hi Mark! Thank you so much for your work! Unfortunately my explanation wasn't good enough. There is a problem with the macro.
Namely that there are many fieldnames. Your macro works only for the name, city, birth place, birth date and mother name. But there are more fieldnames. I can't really list them all because there are many. Can this macro be done without listing all the fieldnames? Just search for the fieldname between the apostrophe and if it not név than add it to the columns. If it is already listed in one of the columns then just add the value. Can this somehow be done?

agraano
07-31-2010, 12:32 PM
Sub process()

' / Go to the relevant Cell or Range
' / Loop through relevant Range
Dim long_LoopVariable As Long
Dim int_startPos, int_endPos As Integer
Dim str_fieldName, str_fieldValue As String

long_LoopVariable = 4
Sheets("Munka1").Select

While Cells(long_LoopVariable, 2).Value <> ""

' THIS SECTION OF CODE EVALUATED IF <MEZO CONTAINS
' IN THE STRING (BASICALLY, IF IT DOESNT, THEN THERE
' IS NO VALUE THAT WE ARE INTERESTED IN ;)

If InStr(Cells(long_LoopVariable, 2).Value, "<mezo") > 0 Then

' EXTRACT THE FIELD NAME
' You'll need to add separate code which decides what to do with
' the field name that has been extracted. example, do you need to create a
' new column or not?

int_startPos = InStr(Cells(long_LoopVariable, 2).Value, "=" & Chr(34))
int_endPos = InStr(Cells(long_LoopVariable, 2).Value, Chr(34) & ">")

str_fieldName = Mid(Cells(long_LoopVariable, 2).Value, int_startPos + 2, _
(int_endPos - int_startPos - 2))

MsgBox "Volia... Here goes the field Name for the value in row No " & _
long_LoopVariable & " : " & str_fieldName

' EXTRACT THE FIELD VALUE
' Again, add code for what do you want to do with the the field value - example,
' what if its null?

int_startPos = InStr(Cells(long_LoopVariable, 2).Value, Chr(34) & ">")
int_endPos = InStr(Cells(long_LoopVariable, 2).Value, "</mezo>")

str_fieldValue = Mid(Cells(long_LoopVariable, 2).Value, int_startPos + 2, _
(int_endPos - int_startPos - 2))

MsgBox "Volia... Here goes the field Value for the value in row No " & _
long_LoopVariable & " : " & str_fieldValue

End If

long_LoopVariable = long_LoopVariable + 1

Wend

End Sub

a. perhaps not the most efficient way, but it should do what you want to get done - i.e., loop through the entire data set and extract field name and field value (ignores free text types)
b. you still need to add codes for deciding what you want to do with the output.

anoop

hunsnowboard
08-01-2010, 05:48 AM
Hi Anoop! Thank you very much for your help! However, my VBA knowledge is too poor to write the missing codes.!

Anyway, thank you very much for your help!

hunsnowboard
08-02-2010, 05:52 AM
Can someone please help me?

GTO
08-03-2010, 05:43 AM
del...

GTO
08-03-2010, 05:51 AM
Hi Mark! Thank you so much for your work! Unfortunately my explanation wasn't good enough. There is a problem with the macro.
Namely that there are many fieldnames. Your macro works only for the name, city, birth place, birth date and mother name. But there are more fieldnames. I can't really list them all because there are many. Can this macro be done without listing all the fieldnames? Just search for the fieldname between the apostrophe and if it not név than add it to the columns. If it is already listed in one of the columns then just add the value. Can this somehow be done?

Speaking only for myself, I would want to know:


Can we assuredly count on "<mezo=" (and whatever follows) always indicating a valid field name?
Is "<free text>" literal, and the only lines we'd want not to grab/add?
You say that there are too many fieldnames to list. Are these changing from time-to-time? If so, how do you know that the number of fieldnames cannot exceed the number of columns available?