PDA

View Full Version : Solved: Need to extract a few items from 250 XML Files in a Tree View, to a Spread Sheet



frank_m
01-26-2012, 09:27 PM
I have a few pieces of information in each of 250 XML Files, that I need to put in a single Spread Sheet.

When I look at the XML files using Microsoft's free XLM Note Pad, they are in a tree view. -- I've tried opening them in Excel with the file type set to XML, but unfortunately when I do that, most of the information is lost as a result.(regardless of the settings I choose),

Would someone here be willing to write some code that will scan thru the tree view folders in each of the .XML files, to extract the information that I have specified in the sample attached workbook?

It will likely be clearer what I need if you take a look at my notes in the sample workbook.

Thank you much.

mohanvijay
01-26-2012, 11:59 PM
Try this and See attached file


Sub XML_2_XL()
Dim XML_File As Object
Dim FSO As Object
Dim Ob_Fol As Object
Dim Ob_File As Object
Dim XML_2_Str As String
Dim T_Str As String
Dim T_Lng As Long, T_Lng2 As Long, Aft_1 As Long, Aft_2 As Long
Dim i As Long
Dim Hld_Cria(11) As String
Dim Hld_Cri2_En(11) As String
Dim Rw_Count As Long
Dim Dlg_Fol As FileDialog
Set Dlg_Fol = Application.FileDialog(msoFileDialogFolderPicker)
Dlg_Fol.Title = "Select folder contains XML File"
If Dlg_Fol.Show = -1 Then
T_Str = Dlg_Fol.SelectedItems(1)
Else
Set Dlg_Fol = Nothing
Exit Sub
End If
Set Dlg_Fol = Nothing
Hld_Cria(0) = "<SubItem name="
Hld_Cria(1) = "<HomogeneousMaterial name="
Hld_Cria(2) = "weight="
Hld_Cria(3) = "<Substance cas="
Hld_Cria(4) = "name="
Hld_Cria(5) = "weight="
Hld_Cria(6) = "<Substance cas="
Hld_Cria(7) = "name="
Hld_Cria(8) = "weight="
Hld_Cria(9) = "<Substance cas="
Hld_Cria(10) = "name="
Hld_Cria(11) = "weight="
Hld_Cri2_En(0) = ">"
Hld_Cri2_En(1) = ">"
Hld_Cri2_En(2) = "/>"
Hld_Cri2_En(3) = " n"
Hld_Cri2_En(4) = ">"
Hld_Cri2_En(5) = "/>"
Hld_Cri2_En(6) = " n"
Hld_Cri2_En(7) = ">"
Hld_Cri2_En(8) = "/>"
Hld_Cri2_En(9) = " n"
Hld_Cri2_En(10) = ">"
Hld_Cri2_En(11) = "/>"
Set FSO = CreateObject("Scripting.FilesystemObject")
Set Ob_Fol = FSO.getfolder(T_Str)
Rw_Count = 2
For Each Ob_File In Ob_Fol.Files
T_Str = UCase(Ob_File.Path)

If Right(T_Str, 4) = ".XML" Then

Set XML_File = FSO.OpenTextFile(T_Str)
XML_2_Str = XML_File.readall
Aft_1 = 1

For i = 0 To 11

T_Lng = InStr(Aft_1, XML_2_Str, Hld_Cria(i))
T_Lng2 = InStr(T_Lng, XML_2_Str, Hld_Cri2_En(i))

T_Str = Mid(XML_2_Str, (T_Lng + Len(Hld_Cria(i))), T_Lng2 - (T_Lng + Len(Hld_Cria(i))))

T_Str = Trim(T_Str)

T_Str = Left(T_Str, Len(T_Str) - 1)
T_Str = Right(T_Str, Len(T_Str) - 1)

Aft_1 = T_Lng2

Cells(Rw_Count, i + 1).Value = T_Str

Next i
Rw_Count = Rw_Count + 1
XML_File.Close
Set XML_File = Nothing
End If
Next
Set Ob_File = Nothing
Set Ob_Fol = Nothing
Set FSO = Nothing
End Sub

frank_m
01-27-2012, 02:45 AM
Thanks mohanvijay

Very nice work kind sir. What you have done will save at least a few days of tedious labor :thumb

Perhaps you could also assist me with adding some error handling for two situations that I came across. - Approx. 6 out of the 250 files cause a garbled output because those files are missing the data source paths. -- And another 8 or 10 seem to have the source's in place, but something I did not spot yet must be a little off, as those also have erratic output.

I've attached one sample of each type of problematic XML to show what I mean. Perhaps there could be a note given in Column 1 to inform me that File name had a problem. - I've spent an hour trying a few different things with an error handler, but haven't been able to get anywhere.

Thanks again

mohanvijay
01-27-2012, 03:05 AM
change below code



For i = 0 To 11

T_Lng = InStr(Aft_1, XML_2_Str, Hld_Cria(i))

T_Lng2 = InStr(T_Lng, XML_2_Str, Hld_Cri2_En(i))

T_Str = Mid(XML_2_Str, (T_Lng + Len(Hld_Cria(i))), T_Lng2 - (T_Lng + Len(Hld_Cria(i))))

T_Str = Trim(T_Str)

T_Str = Left(T_Str, Len(T_Str) - 1)
T_Str = Right(T_Str, Len(T_Str) - 1)

Aft_1 = T_Lng2

Cells(Rw_Count, i + 1).Value = T_Str

Next i


as follows



For i = 0 To 11

T_Lng = InStr(Aft_1, XML_2_Str, Hld_Cria(i))

If T_Lng = 0 Then
Cells(Rw_Count, 1).Value = "Error In File - " & Ob_File.Path
Exit For
End If

T_Lng2 = InStr(T_Lng, XML_2_Str, Hld_Cri2_En(i))

If T_Lng2 = 0 Then
Cells(Rw_Count, 1).Value = "Error In File - " & Ob_File.Path
Exit For
End If

T_Str = Mid(XML_2_Str, (T_Lng + Len(Hld_Cria(i))), T_Lng2 - (T_Lng + Len(Hld_Cria(i))))

T_Str = Trim(T_Str)

T_Str = Left(T_Str, Len(T_Str) - 1)
T_Str = Right(T_Str, Len(T_Str) - 1)

Aft_1 = T_Lng2

Cells(Rw_Count, i + 1).Value = T_Str

Next i

frank_m
01-27-2012, 03:16 AM
Absolutely fantastic. -- What I hoped for, and more.

You've created a happy camper over here.
:beerchug:

Have a great day.