View Full Version : [SOLVED:] VBA Word regular expression pattern match
RCGUA123
01-07-2014, 02:58 PM
Hello, I am trying to find the correct Regular Expression pattern to match html strings that contain image names. The image names always start with <img and end with a close angle bracket. The image names always end with .gif or .jpg I only need to extract the actual image name. Some image name examples are:
<img width="100%" src="orange.jpg">
<img src="orange.jpg">
text here</font></p> <IMG SCR="example_logo.gif">
text <img src="example_logo.gif">text here</font></p> 
If I use .Pattern = "<img\s*src=""([^""]*)"""
it will match: <img scr
but not: <img width
RCGUA123
01-08-2014, 08:27 AM
I figured it out.  The RegEx pattern that worked is below:
.Pattern = "<img(.*?)>"
macropod
01-08-2014, 08:51 PM
In Word, you can do this without recourse to RegEx, using a wildcard Find:
.Text = \<img*\>
RCGUA123
01-20-2014, 06:13 AM
Thanks, I appreciate the help.  I decided to use Regular Expressions instead of Word because I am searching through large html files with lots of lines of html code and I thought using RegEx would be faster than opening the file and searching in Word.
lecxe
01-27-2014, 02:29 PM
Hi RCGUA
For the examples you post, I think a pattern similar to the one you posted in post #1 would be preferable. You'd get the name of the file directly.
    .Pattern = "<img\s.*?src=""([^""]+)"""
Also make the matching case insensitive.
RCGUA123
01-28-2014, 06:19 AM
Thanks,   I think the first pattern worked for image names that begin with <img src  
but I think it didn't work for image names like:  <img width="100%" src="orange.jpg">
lecxe
01-28-2014, 07:30 AM
Hi
I tried with your example and it works for me.
Please try:
Sub Test()
Dim s As String
s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
With CreateObject("VBScript.RegExp")
    .pattern = "<img\s.*?src=""([^""]+)"""
    MsgBox .Execute(s)(0).submatches(0)
End With
End Sub
RCGUA123
01-29-2014, 06:16 AM
Thanks lecxe, I must have missed the word "similar" in your first post and I didn't notice the slight difference. Why do you think: "<img\s.*?src=""([^""]+)""" 
is preferable to: "<img(.*?)>"
I will change my code based on your answer. Will it run faster, or get a variety of image names? I don't know reg ex so I appreciate your help and advice.
lecxe
01-29-2014, 07:44 AM
Why do you think: "<img\s.*?src=""([^""]+)""" is preferable to: "<img(.*?)>"
I'm not sure I understood exactly what you want, so I'm writing what I understood.
In the text:
XXX<img width="100%" src="orange.jpg">XXX
What do you want to get?
What I understood is that you want to get
orange.jpg
Is it true? If not please post what you'd like to get from that text.
This is why I used the pattern
"<img\s.*?src=""([^""]+)"""
because, as you can see running the code I posted, you get directly "orange.jpg" out of the text.
Please clarify.
RCGUA123
01-30-2014, 01:38 PM
Yes, you are correct I want to get:  orange.jpg      as I understand it, both patterns will get orange.jpg    and both patterns will run at the same speed, but, I may be wrong.  My question was if both patterns do the same thing, is one preferable?  For example, does one run faster than the other?
lecxe
01-31-2014, 02:31 AM
... Yes, you are correct I want to get:  orange.jpg      as I understand it, both patterns will get orange.jpg    and both patterns will run at the same speed ... 
I don't understand how you came to this conclusion.
Using as an example the string in post #14 with an image tag:
 s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">" 
As I see it, the first pattern 
        .Pattern = "<img\s.*?src=""([^""]+)"""
Get's you "orange.pjg" directly (just execute the code in post #14 to confirm it)
Now your suggestion for the pattern
        .Pattern = "<img(.*?)>"
If you use it with the same string, and get both the match and the submatch:
Sub Test() 
    Dim s As String 
     
    s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">" 
    With CreateObject("VBScript.RegExp") 
        .pattern = "<img(.*?)>"
        MsgBox "Match: " & .Execute(s)(0)
        MsgBox "Submatch: " & .Execute(s)(0).submatches(0) 
    End With 
End Sub 
The result is:
Match: <img width="100%" src="orange.jpg">
Submatch:  width="100%" src="orange.jpg"
None of the two gives you "orange.jpg" directly.
I don't understand. Can you clarify?
Please post the code you used to test.
RCGUA123
01-31-2014, 07:31 AM
My mistake I apologize, I just looked at the code and I realize that I hacked something together.  Not knowing regex I managed to get most of the image name and then I ran the string through some things to extract the bare image name.  Thank you for being persistent, I will change my code, your pattern match will work much better.  Thanks again.
lecxe
01-31-2014, 07:59 AM
I'm glad I was able to help. Thanks for the feedback.
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.