Consulting

Results 1 to 13 of 13

Thread: VBA Word regular expression pattern match

  1. #1

    VBA Word regular expression pattern match

    Hello, I am trying to find the correct Regular Expression pattern to match html strings that contain image names. The image names always start with <img and end with a close angle bracket. The image names always end with .gif or .jpg I only need to extract the actual image name. Some image name examples are:
    <img width="100%" src="orange.jpg">
    <img src="orange.jpg">
    text here</font></p> <IMG SCR="example_logo.gif">
    text <img src="example_logo.gif">text here</font></p>

    If I use .Pattern = "<img\s*src=""([^""]*)"""
    it will match: <img scr
    but not: <img width

  2. #2
    I figured it out. The RegEx pattern that worked is below:

    .Pattern = "<img(.*?)>"

  3. #3
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    In Word, you can do this without recourse to RegEx, using a wildcard Find:
    .Text = \<img*\>
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  4. #4
    Thanks, I appreciate the help. I decided to use Regular Expressions instead of Word because I am searching through large html files with lots of lines of html code and I thought using RegEx would be faster than opening the file and searching in Word.

  5. #5
    Hi RCGUA

    For the examples you post, I think a pattern similar to the one you posted in post #1 would be preferable. You'd get the name of the file directly.

        .Pattern = "<img\s.*?src=""([^""]+)"""
    Also make the matching case insensitive.

  6. #6
    Thanks, I think the first pattern worked for image names that begin with <img src
    but I think it didn't work for image names like: <img width="100%" src="orange.jpg">

  7. #7
    Hi

    I tried with your example and it works for me.

    Please try:


    Sub Test()
    Dim s As String
    
    s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
    With CreateObject("VBScript.RegExp")
        .pattern = "<img\s.*?src=""([^""]+)"""
        MsgBox .Execute(s)(0).submatches(0)
    End With
    End Sub

  8. #8
    Thanks lecxe, I must have missed the word "similar" in your first post and I didn't notice the slight difference. Why do you think: "<img\s.*?src=""([^""]+)"""
    is preferable to: "<img(.*?)>"

    I will change my code based on your answer. Will it run faster, or get a variety of image names? I don't know reg ex so I appreciate your help and advice.

  9. #9
    Quote Originally Posted by RCGUA123 View Post
    Why do you think: "<img\s.*?src=""([^""]+)""" is preferable to: "<img(.*?)>"
    I'm not sure I understood exactly what you want, so I'm writing what I understood.

    In the text:

    XXX<img width="100%" src="orange.jpg">XXX
    What do you want to get?

    What I understood is that you want to get

    orange.jpg
    Is it true? If not please post what you'd like to get from that text.

    This is why I used the pattern

    "<img\s.*?src=""([^""]+)"""

    because, as you can see running the code I posted, you get directly "orange.jpg" out of the text.

    Please clarify.

  10. #10
    Yes, you are correct I want to get: orange.jpg as I understand it, both patterns will get orange.jpg and both patterns will run at the same speed, but, I may be wrong. My question was if both patterns do the same thing, is one preferable? For example, does one run faster than the other?

  11. #11
    Quote Originally Posted by RCGUA123 View Post
    ... Yes, you are correct I want to get: orange.jpg as I understand it, both patterns will get orange.jpg and both patterns will run at the same speed ...
    I don't understand how you came to this conclusion.

    Using as an example the string in post #14 with an image tag:

     s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
    As I see it, the first pattern

            .Pattern = "<img\s.*?src=""([^""]+)"""
    Get's you "orange.pjg" directly (just execute the code in post #14 to confirm it)

    Now your suggestion for the pattern

            .Pattern = "<img(.*?)>"
    If you use it with the same string, and get both the match and the submatch:

    Sub Test() 
        Dim s As String 
         
        s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">" 
        With CreateObject("VBScript.RegExp") 
            .pattern = "<img(.*?)>"
            MsgBox "Match: " & .Execute(s)(0)
            MsgBox "Submatch: " & .Execute(s)(0).submatches(0) 
        End With 
    End Sub


    The result is:

    Match: <img width="100%" src="orange.jpg">
    Submatch: width="100%" src="orange.jpg"

    None of the two gives you "orange.jpg" directly.

    I don't understand. Can you clarify?

    Please post the code you used to test.

  12. #12
    My mistake I apologize, I just looked at the code and I realize that I hacked something together. Not knowing regex I managed to get most of the image name and then I ran the string through some things to extract the bare image name. Thank you for being persistent, I will change my code, your pattern match will work much better. Thanks again.

  13. #13
    I'm glad I was able to help. Thanks for the feedback.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •