PDA

View Full Version : Finding a fast way to Import XML into Access 2007 (with identical tag entries)



KarasLegacy
09-02-2010, 01:23 PM
I have recently been using access to pull in XML for further work.

My access tool send a URL out, and retrieves the XML result. It then stores it in a table.

The XML frequently has a structure like this

<Document>
<name>Bob<name>
<place>Ocean<place>
<place>Sea<place>

And notice that yes, there are two tags with the same name. There is no way to get around this from my end.

I have tried the importXML method in Access. This results in a very quick load, but unfortunately it does not handle the second place field. Basically the second place field gets ignored.


I then tried to create a DOMdocument object and parse out the pieces of the XML. This works, but it extrememly slow compared to the importXML method (on the order of 20 times slower).


So I'm looking for the best of both worlds. Is it possible?

Note: I am not married to the format of how my data is loaded, for example, the following entries into my table would be fine.

1) Each field gets it own column
2) All entries get their own row.
3) The place fields are all in a single column, concetenated in some fashion.

hansup
09-07-2010, 07:00 AM
I think I've been in your situation. The ImportXML method is simple and convenient when it provides what you need. Trouble is it can't cope with even moderately complex XML documents. I had to resort to the DOMDocument approach for those. I don't know any way to combine the two.

Regarding the speed of DOMDocument parsing, I seem to recall the rate-limiting step was storing the extracted values in my tables. When I disabled the code which stored the values, parsing the document and extracting values of interest seemed tolerably fast.

If your experience is different, all I can suggest is to show us your code and see if anyone can identify optimizations.

KarasLegacy
09-08-2010, 04:40 PM
Here is most of the parsing code I am running. It queries our database for two fields from source 13, and puts those fields in a table called IdolResults.

I have only made slight alterations for security purposes.

I have found that the adding to a table does increase slowness, but its still very slow even if I have it just store results in a variable.

PS: apparantely I need a post count of 5 before I can send this, so I will do that now

KarasLegacy
09-08-2010, 04:41 PM
Sending another reply to bump post count so I can send the code

KarasLegacy
09-08-2010, 04:43 PM
Sending another reply to bump post count so I can send the code (2)

KarasLegacy
09-08-2010, 04:45 PM
Sending another reply to bump post count so I can send the code (3)

KarasLegacy
09-08-2010, 04:46 PM
Finally I can send the code!



Public Function Runner()
fPullMeta "IP", "20000", 13
End Function

Public Sub fPullMeta(sIP As String, sPort As String, sSource As String)
'This function takes in an ip,port,a source numeric, and a volume numeric. It queries idol and places the metadata into the DOCUMENT table.
Dim xmldoc As DOMDocument
Set xmldoc = New DOMDocument
xmldoc.async = False
xmldoc.validateOnParse = False
Dim Query As String
Query = "http://" & sIP & ":" & sPort & "/action=query&anylanguage=true&print=fields&printfields=UUID,AUN_PRODUCTION_VOLUME_NUMERIC&DATABASEMATCH=" & sSource & "&fieldtext=EXISTS{}:AUN_PRODUCTION_VOLUME_NUMERIC&SORT=ALPHABETICAL:UUID"
xmldoc.Load (Query)
ParseXML xmldoc
Next
End Sub

Public Sub ParseXML(xmldoc As DOMDocument)
Dim doc As IXMLDOMNodeList
'Goes through the results of IDOL, looks for specific values, and copies those values into the IDOLRESULTS table.
Set doc = xmldoc.getElementsByTagName("DOCUMENT")
Dim entry, child As IXMLDOMNode
Dim children As Recordset
Set children = CurrentDb.OpenRecordset("IdolResults")
Dim UUID As String
Dim VolID As Long
VolID = 0
For Each entry In doc
For Each child In entry.childNodes
If (child.nodeName = "UUID") Then
UUID = child.Text
ElseIf (child.nodeName = "AUN_PRODUCTION_VOLUME_NUMERIC") Then
If child.Text > VolID Then
VolID = child.Text
End If
End If
children.AddNew
children(0) = UUID
children(1) = VolID
children.Update
Next
Next
children.Close
End Sub

hansup
09-08-2010, 06:45 PM
I re-formatted your code using the VBA tag so I could see it better.

Public Function Runner()
fPullMeta "IP", "20000", 13
End Function
Public Sub fPullMeta(sIP As String, sPort As String, sSource As String)
'This function takes in an ip,port,a source numeric, and a volume numeric.
'It queries idol and places the metadata into the DOCUMENT table.
Dim xmldoc As DOMDocument
Set xmldoc = New DOMDocument
xmldoc.async = False
xmldoc.validateOnParse = False
Dim Query As String
Query = "http://" & sIP & ":" & sPort & _
"/action=query&anylanguage=true&print=fieldsprintfields=UUID," & _
"AUN_PRODUCTION_VOLUME_NUMERIC&DATABASEMATCH=" & _
sSource & "&fieldtext=EXISTS{}:AUN_PRODUCTION_VOLUME_NUMERIC&SORT=ALPHABETICAL:UUID"
xmldoc.Load (Query)
ParseXML xmldoc
Next
End Sub
Public Sub ParseXML(xmldoc As DOMDocument)
Dim doc As IXMLDOMNodeList
'Goes through the results of IDOL, looks for specific values, and copies those
'values into the IDOLRESULTS table.
Set doc = xmldoc.getElementsByTagName("DOCUMENT")
Dim entry, child As IXMLDOMNode
Dim children As Recordset
Set children = CurrentDb.OpenRecordset("IdolResults")
Dim UUID As String
Dim VolID As Long
VolID = 0
For Each entry In doc
For Each child In entry.childNodes
If (child.nodeName = "UUID") Then
UUID = child.Text
ElseIf (child.nodeName = "AUN_PRODUCTION_VOLUME_NUMERIC") Then
If child.Text > VolID Then
VolID = child.Text
End If
End If
children.AddNew
children(0) = UUID
children(1) = VolID
children.Update
Next
Next
children.Close
End Sub

hansup
09-08-2010, 07:31 PM
Why do you have Next as the last line in fPullMeta? Doesn't that throw a compiler error?

Although you indicated storing values in the table is not really the bottleneck, I suggest try including dbAppendOnly constant with OpenRecordset.

I have only used validateOnParse = True with my XML document processing. I don't know if it has any effect on the speed of the subsequent operations; I'd try it just to check.

Honestly I don't understand why your code is intolerably slow.

When you have "child.Text > VolID", you're comparing a text value to a long integer. Apparently VBA is smart enough to realize what you want and do it. But I would try explicitly casting the text as long so VBA won't have to figure out what to do: "CLng(child.Text) > VolID". Same for when you store the value to VolID: "VolID = CLng(child.Text)"

I don't understand your logic in the If ... Then block. You read a UUID value, then store both UUID and VolID as a new record in your table. Next time through, you read AUN_PRODUCTION_VOLUME_NUMERIC and store UUID and VolID as another new record. Is that right? Seems to me you would read both, then store them in one record.

You might consider looking into Xpath to address your target nodes directly rather than examining every node to determine whether it's one of the two you're interested in. However, I haven't used Xpath myself, so I can't offer any assistance there.

Like I said, I don't see why it should be slow. If you would like to give me a Zip of your database and a sample XML document, I will look further.