Consulting

Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 51

Thread: verify start and end tags

  1. #1
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location

    verify start and end tags

    I have a question about validating XML tags in word documents

    How to check if every start tag has it's end tag (<Amend> </Amend> and so on)

    for instance:

    <Amend>bla bla bla <NumAm>1</NumAM>
    bla bla bla </Amend>

    <Amend>bla bla bla <NumAm>1</NumAM>
    bla bla bla </Amen

    There are many tags in document how to check whether some of them are missing or damaged, and to show me where is missing or is damaged

  2. #2
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location

    sample file for validating XML tags

    lets assume that in between the tags text is written

  3. #3
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    If you could NOT have nested tags this would not be really all that difficult. Tedious, but not difficult. However, as you CAN nest tags, this vastly increases the logic statements required.

    Can be done, but would be SO tedious...it becomes difficult.

    Essentially, take just a piece your sample. I have adjusted the presentation of it to try and make this easier to read. And let's pretend that the first <Amend> is the beginning of the doc.

    <Amend>
    <Date>{07/12/2005}7.12.2005</Date>
    <ANo>A6-0317</ANo>/
    <NumAm>49</NumAm>
    </Amend>

    OK, say you are testing for <Amend> to see if it has a proper </Amend>.

    1. Go to the start of the doc.
    2. Find the first tag. Search for any text enclosed by < >.
    3. Make a string variable for that. Could use wildcards as well.
    4. Search forward text for this variable, but with the added "/".
    5. Search BACK to see if there is another instance of the original string.

    LOGIC: the issue is how do you trap an instance of a word (a string) between other strings.

    A:
    <Amend> text text <Amend>
    <Date>textext<Date> <Amend>texttext</Amend> WRONG

    B:
    <Amend> text text </Amend>
    <Date> textext<Amend>text text </Amend> WRONG

    So say you search for a tag PLUS that tag again with closing character in it.

    In A you end with the </Amend at the end of this snippet - with tags in between. What do you do? You have to search THOSE tags logically. Is one of them another <Amend>? Is it the first one? If so, then THAT one needs the closing "/".

    Now another choice. Do you continue to determine the logic decisions for this initial chunk? In other words, do you do the logic testing for the OTHER tags - in this case, <Date>? Or do you finish with the original tag - this case <Amend>?

    In B you end up with the correct closing tag - [b]with NO tags in between. But how do you know that? You don't. The most important point being is that one of them may be another <Amend> (or the original tag, whatever it is you are testing). If it is - then that is probably...but may not be...you are going to have to test...the closing tag.

    So...you gotta check.

    Do you see what I mean. Yes, this could be done. There may be more efficient ways of going about it. Likely there are. Still, it is an issue of the ....booooorrrrrring..tediousness of doing it.

    Mind you, if you have a lot of this to do, well it may be worth it.
    Last edited by fumei; 01-12-2006 at 07:05 PM.

  4. #4
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    I was thinking to test this tags lets say through input box for example the user puts text to check in input box. Lets say he puts in input box starting tag amend and word searches for first instance of <Amend> tag and before finding the next <Amend> it should find closing tag </Amend> if it finds <Amend> before </Amend> it means that closing tag is missing. Am I thinking right or not? I am not even sure if I am thinking right

  5. #5
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    I find your A option Ok ,but how could i write this in code

  6. #6
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    word searches for first instance of <Amend> tag and before finding the next <Amend> it should find closing tag </Amend> if it finds <Amend> before </Amend> it means that closing tag is missing.
    Yes, this is true, but think about it. A search instruction is a search instruction. You search FOR the next <Amend>. You can not search for two things at once. So you say search for <Amend> but before you find <Amend> find </Amend>...it does not work that way. You can look for <Amend> OR you can look for </Amend>. You can not look for both at the same time.

    So again, this has to be done - as I posted - by logic. Find the next </Amend>, go back and check if there is an <Amend> between your starting point and end point....yadda yadda yadda.

    How do you code this? By coding it using the needed logic, exactly as I posted. This is the problem, it is tedious, fussy and must be completely air tight logic.

  7. #7
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    I don't know if this is any help, but here's some rough code to list all the tag codes. I think with a little (maybe a lot) more work you could record the tags and add/reduce a tab count which could produce an indented listing.
    Regards
    MD

    [VBA]Sub Tags()
    Dim MyData(100)
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
    .Text = "<"
    .Replacement.Text = "xx"
    .Forward = True
    .Wrap = wdFindContinue
    .MatchWildcards = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
    .Text = ">"
    .Replacement.Text = "zz"
    .Forward = True
    .Wrap = wdFindContinue
    .MatchWildcards = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.HomeKey Unit:=wdStory

    For i = 1 To 99
    With Selection.Find
    .Text = "xx*zz"
    .MatchWildcards = True
    .Forward = True
    .Wrap = wdFindStop
    End With
    Selection.Find.Execute
    MyData(i) = Mid(Selection.Text, 3, Len(Selection.Text) - 4)
    Next
    With Selection.Find
    .Text = "xx"
    .Replacement.Text = "<"
    .Forward = True
    .Wrap = wdFindContinue
    .MatchWildcards = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
    .Text = "zz"
    .Replacement.Text = ">"
    .Forward = True
    .Wrap = wdFindContinue
    .MatchWildcards = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Documents.Add
    For i = 1 To 100
    Selection.TypeText MyData(i) & vbCr
    Next
    End Sub

    [/VBA]
    Last edited by mdmackillop; 01-31-2006 at 02:28 PM. Reason: Loop error corrected
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  8. #8
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    thnx I will give it a try and let you know


  9. #9
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Nice md, but sorry, that does NOT take in any consideration whatsoever the essential issue - which is <whatever> properly followed by a </whatever>.

    There is no logic at all to deal with determining if a tag is properly closed. Yes it lists them..which is good I suppose, but there is no logic to deal with incorrect ones. I mean if you want to get the list, simply extract all text strings that are between < and >. Would be much simpler.

  10. #10
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    And when i extract them how can check if there is some damaged ones any ideas?
    What would be your solution ??

  11. #11
    Moderator VBAX Master geekgirlau's Avatar
    Joined
    Aug 2004
    Location
    Melbourne, Australia
    Posts
    1,464
    Location
    As a starting point, what about a simple count of tags and their matching end tags (for example, <Amend> occurs 20 times, </Amend> occurs 19 times) and only display a list (or highlight in the document) those tags where the count of start and end values do not match?

  12. #12
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    I already did counting of the codes now I cant figure out how to higlight non matching tags

    Thnx
    here is the code:
    Sub d()
    Dim iCount As Long
    Dim strSearch As String
    Dim nasel As Boolean
    Dim lcount As Long
    Dim Mcount As Long
    Dim Mecount As Long
    Dim numcount As Long
    Dim numecount As Long
    Dim art As Long
    Dim arte As Long
    Dim orig As Long
    Dim orige As Long
    'strSearch = InputBox$("Type in the text you want to search for.")
    'iCount = 0

    With ActiveDocument.Content.find
    .Text = "<Amend>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")


    Do While .Execute
    iCount = iCount + 1


    Loop
    End With

    With ActiveDocument.Content.find
    .Text = "</Amend>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    lcount = lcount + 1


    Loop
    End With

    With ActiveDocument.Content.find
    .Text = "<Members>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    Mcount = Mcount + 1


    Loop
    End With
    With ActiveDocument.Content.find
    .Text = "</Members>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    Mecount = Mecount + 1
    Loop
    End With
    With ActiveDocument.Content.find
    .Text = "<NumAm>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    numcount = numcount + 1
    Loop
    End With
    With ActiveDocument.Content.find
    .Text = "</NumAm>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    numecount = numecount + 1
    Loop
    End With
    With ActiveDocument.Content.find
    .Text = "<Article>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    artcount = artcount + 1
    Loop
    End With
    With ActiveDocument.Content.find
    .Text = "</Article>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    artecount = artecount + 1
    Loop
    End With
    With ActiveDocument.Content.find
    .Text = "<Original>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    origcount = origcount + 1
    Loop
    End With
    With ActiveDocument.Content.find
    .Text = "</Original>"
    .Format = False
    .Wrap = wdFindStop
    .Style = ActiveDocument.Styles("HideTWBExt")
    Do While .Execute
    origecount = origecount + 1
    Loop
    End With
    msgbox "<Amend>" & " sem na?el " & _
    iCount & " krat" & vbCrLf & "</Amend>" & " sem na?el " & lcount _
    & " krat " & vbCrLf & vbCrLf & "<Members>" & " sem na?el " & Mcount & " krat" & vbCrLf _
    & "</Members>" & " sem na?el " & _
    Mecount & " krat" & vbCrLf & vbCrLf & "<NumAm>" & " sem na?el " & _
    numcount & " krat" & vbCrLf & "</NumAm>" & " sem na?el " & _
    numecount & " krat" & vbCrLf & vbCrLf & "<Article>" & " sem na?el " & _
    artcount & " krat" & vbCrLf & "</Article>" & " sem na?el " & _
    artecount & " krat" & vbCrLf & vbCrLf & "<Original>" & " sem na?el " & _
    origcount & " krat" & vbCrLf & "</Original>" & " sem na?el " & _
    origecount & " krat"

  13. #13
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    Would it be possible not to higlight all the tags ( i guess if some of the "<Amend>" or "</Amend>" is not matching it will higlight all the <Amend> and </Amend> codes not just the ones that misses their start or end tags??)

  14. #14
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    People, people. The actual point is still being missed!

    YES - you can count tags. YES - that would identify that there is a missed tag. Say 20 <Amend> and 19 </Amend>. But it tells you NOTHING about the logic.

    Does that means there are really 19 proper tags, and an EXTRA <Amend>?

    OR;

    Does that mean there are 20 proper tags and a MISSING </Amend>?

    See what I mean? There is no way to know unless you parse it. Parsing is a logic operation. A count is a good starting point but it does NOT help (really) at all with the logic needed.

    Yes - you can highlight tags....but the logic problem remains.

    You need to match, and you need to match in the proper order.
    higlight all the <Amend> and </Amend> codes not just the ones that misses their start or end tags??)
    I don't know how many more times I can state this. The answer is YES! You can do this. But it requires very detailed, flawlessly convoluted logic. There is no other way.

    Further, as I stated before, the logic is not difficult, but it IS tedious. If you have a real need for a tool like this, then by all means do it...and use it.

    I mean you could do a superficial count operation. That would at least warn you that something is wrong - but it would not tell you exactly what it is (is it an extra, or a missing tag), nor would it tell where it is.

    A really functional tool requires perfect logic. This logic MUST perform a variable number of loopback operations.

    A: <Amend> text text <Amend> text text ettstst </Amend>
    B: <Amend> text text text text ettstst </Amend>

    Which is correct? B: right? That is easy. But how do you KNOW there is not an improper tag between an <Amend> and an </Amend>, as in A:? You can not know - unless you actually check. There is no other way. You must check. Period.

    Further there must be logic test to see - is the first <Amend> correct......and the second one needs to be removed; or is the second one correct, and the first one needs to be removed. Further, if the second one is correct...are you sure the first one needs to be removed...or does it actually need a closing tag? Further, if the first one is correct does the second one ned to be removed...or does IT need a closing tag?

    These are logic tests. And this thing just ain't gonna fly without them.

  15. #15
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    gerry you seemed to know what are you talking about, but I just cant figure out how to write this in code?? can you please write me this code for parsing logic

  16. #16
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    This is definitely not my area of expertise, but in an effort to assist I'll offer my thoughts.
    I totally agree with Gerry, you have to solve the logic; however, I don't know how complicated your web pages are. Do the start tags contain more text than the end tags? This obviously causes comparison problems. How many tags are you actually using? With a limited number, I can see how some array comparisons may assist. Is it really necessary to use VBA to solve your problem? With a printout and a pencil, I'm sure I could check simple web pages quite quickly, using my earlier code.
    If I was composing stuff, I'd probably enter start/end codes as a "pair" and infill the text and other tags (in pairs) between as required, but maybe I'm being too simplistic.
    Regards
    MD
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  17. #17
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    it is not a web page. It is a word document which has XML tags in it. I will attach an example of it

  18. #18
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location

    sample of document

    Lets say I need to check just <Amend> and </Amend> codes cause I guess it is the same procedure for every other code

    Thnx

  19. #19
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    This is, as Gerry keeps saying, complex. If I had more time (and knowledge) I would be interested in creating a Word AddIn to do this. Meanwhile I would suggest that you investigate other tools. Have you tried google for xml validators (or similar)? There are tools out there and you should be able to find something either to check for well-formedness (what a horrible word - is it correct) or validity against a style sheet or transform.
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  20. #20
    VBAX Mentor
    Joined
    Jan 2006
    Posts
    348
    Location
    ok thnx

    I have looked t XML parser but these are to complex i guess

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •