Consulting

Results 1 to 9 of 9

Thread: Finding a fast way to Import XML into Access 2007 (with identical tag entries)

  1. #1

    Finding a fast way to Import XML into Access 2007 (with identical tag entries)

    I have recently been using access to pull in XML for further work.

    My access tool send a URL out, and retrieves the XML result. It then stores it in a table.

    The XML frequently has a structure like this

    <Document>
    <name>Bob<name>
    <place>Ocean<place>
    <place>Sea<place>

    And notice that yes, there are two tags with the same name. There is no way to get around this from my end.

    I have tried the importXML method in Access. This results in a very quick load, but unfortunately it does not handle the second place field. Basically the second place field gets ignored.


    I then tried to create a DOMdocument object and parse out the pieces of the XML. This works, but it extrememly slow compared to the importXML method (on the order of 20 times slower).


    So I'm looking for the best of both worlds. Is it possible?

    Note: I am not married to the format of how my data is loaded, for example, the following entries into my table would be fine.

    1) Each field gets it own column
    2) All entries get their own row.
    3) The place fields are all in a single column, concetenated in some fashion.

  2. #2
    VBAX Mentor
    Joined
    Feb 2009
    Posts
    447
    Location
    I think I've been in your situation. The ImportXML method is simple and convenient when it provides what you need. Trouble is it can't cope with even moderately complex XML documents. I had to resort to the DOMDocument approach for those. I don't know any way to combine the two.

    Regarding the speed of DOMDocument parsing, I seem to recall the rate-limiting step was storing the extracted values in my tables. When I disabled the code which stored the values, parsing the document and extracting values of interest seemed tolerably fast.

    If your experience is different, all I can suggest is to show us your code and see if anyone can identify optimizations.

  3. #3
    Here is most of the parsing code I am running. It queries our database for two fields from source 13, and puts those fields in a table called IdolResults.

    I have only made slight alterations for security purposes.

    I have found that the adding to a table does increase slowness, but its still very slow even if I have it just store results in a variable.

    PS: apparantely I need a post count of 5 before I can send this, so I will do that now

  4. #4
    Sending another reply to bump post count so I can send the code

  5. #5
    Sending another reply to bump post count so I can send the code (2)

  6. #6
    Sending another reply to bump post count so I can send the code (3)

  7. #7
    Finally I can send the code!

    Public Function Runner()
    fPullMeta "IP", "20000", 13
    End Function
    
    Public Sub fPullMeta(sIP As String, sPort As String, sSource As String)
    'This function takes in an ip,port,a source numeric, and a volume numeric. It queries idol and places the metadata into the DOCUMENT table.
    Dim xmldoc As DOMDocument
    Set xmldoc = New DOMDocument
    xmldoc.async = False
    xmldoc.validateOnParse = False
    Dim Query As String
    Query = "http://" & sIP & ":" & sPort & "/action=query&anylanguage=true&print=fields&printfields=UUID,AUN_PRODUCTION_VOLUME_NUMERIC&DATABASEMATCH=" & sSource & "&fieldtext=EXISTS{}:AUN_PRODUCTION_VOLUME_NUMERIC&SORT=ALPHABETICAL:UUID"
    xmldoc.Load (Query)
    ParseXML xmldoc
    Next
    End Sub
    
    Public Sub ParseXML(xmldoc As DOMDocument)
    Dim doc As IXMLDOMNodeList
    'Goes through the results of IDOL, looks for specific values, and copies those values into the IDOLRESULTS table.
    Set doc = xmldoc.getElementsByTagName("DOCUMENT")
    Dim entry, child As IXMLDOMNode
    Dim children As Recordset
    Set children = CurrentDb.OpenRecordset("IdolResults")
    Dim UUID As String
    Dim VolID As Long
    VolID = 0
      For Each entry In doc
            For Each child In entry.childNodes
                 If (child.nodeName = "UUID") Then
                    UUID = child.Text
                 ElseIf (child.nodeName = "AUN_PRODUCTION_VOLUME_NUMERIC") Then
                    If child.Text > VolID Then
                        VolID = child.Text
                    End If
                 End If
            children.AddNew
            children(0) = UUID
            children(1) = VolID
            children.Update
           Next
      Next
    children.Close
    End Sub

  8. #8
    VBAX Mentor
    Joined
    Feb 2009
    Posts
    447
    Location
    I re-formatted your code using the VBA tag so I could see it better.

    [vba]Public Function Runner()
    fPullMeta "IP", "20000", 13
    End Function[/vba]
    [vba]Public Sub fPullMeta(sIP As String, sPort As String, sSource As String)
    'This function takes in an ip,port,a source numeric, and a volume numeric.
    'It queries idol and places the metadata into the DOCUMENT table.
    Dim xmldoc As DOMDocument
    Set xmldoc = New DOMDocument
    xmldoc.async = False
    xmldoc.validateOnParse = False
    Dim Query As String
    Query = "http://" & sIP & ":" & sPort & _
    "/action=query&anylanguage=true&print=fieldsprintfields=UUID," & _
    "AUN_PRODUCTION_VOLUME_NUMERIC&DATABASEMATCH=" & _
    sSource & "&fieldtext=EXISTS{}:AUN_PRODUCTION_VOLUME_NUMERIC&SORT=ALPHABETICAL:UU ID"
    xmldoc.Load (Query)
    ParseXML xmldoc
    Next
    End Sub[/vba]
    [vba]Public Sub ParseXML(xmldoc As DOMDocument)
    Dim doc As IXMLDOMNodeList
    'Goes through the results of IDOL, looks for specific values, and copies those
    'values into the IDOLRESULTS table.
    Set doc = xmldoc.getElementsByTagName("DOCUMENT")
    Dim entry, child As IXMLDOMNode
    Dim children As Recordset
    Set children = CurrentDb.OpenRecordset("IdolResults")
    Dim UUID As String
    Dim VolID As Long
    VolID = 0
    For Each entry In doc
    For Each child In entry.childNodes
    If (child.nodeName = "UUID") Then
    UUID = child.Text
    ElseIf (child.nodeName = "AUN_PRODUCTION_VOLUME_NUMERIC") Then
    If child.Text > VolID Then
    VolID = child.Text
    End If
    End If
    children.AddNew
    children(0) = UUID
    children(1) = VolID
    children.Update
    Next
    Next
    children.Close
    End Sub[/vba]

  9. #9
    VBAX Mentor
    Joined
    Feb 2009
    Posts
    447
    Location
    Why do you have Next as the last line in fPullMeta? Doesn't that throw a compiler error?

    Although you indicated storing values in the table is not really the bottleneck, I suggest try including dbAppendOnly constant with OpenRecordset.

    I have only used validateOnParse = True with my XML document processing. I don't know if it has any effect on the speed of the subsequent operations; I'd try it just to check.

    Honestly I don't understand why your code is intolerably slow.

    When you have "child.Text > VolID", you're comparing a text value to a long integer. Apparently VBA is smart enough to realize what you want and do it. But I would try explicitly casting the text as long so VBA won't have to figure out what to do: "CLng(child.Text) > VolID". Same for when you store the value to VolID: "VolID = CLng(child.Text)"

    I don't understand your logic in the If ... Then block. You read a UUID value, then store both UUID and VolID as a new record in your table. Next time through, you read AUN_PRODUCTION_VOLUME_NUMERIC and store UUID and VolID as another new record. Is that right? Seems to me you would read both, then store them in one record.

    You might consider looking into Xpath to address your target nodes directly rather than examining every node to determine whether it's one of the two you're interested in. However, I haven't used Xpath myself, so I can't offer any assistance there.

    Like I said, I don't see why it should be slow. If you would like to give me a Zip of your database and a sample XML document, I will look further.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •