View Full Version : [SOLVED:] VBA Word regular expression pattern match
RCGUA123
01-07-2014, 02:58 PM
Hello, I am trying to find the correct Regular Expression pattern to match html strings that contain image names. The image names always start with <img and end with a close angle bracket. The image names always end with .gif or .jpg I only need to extract the actual image name. Some image name examples are:
<img width="100%" src="orange.jpg">
<img src="orange.jpg">
text here</font></p> <IMG SCR="example_logo.gif">
text <img src="example_logo.gif">text here</font></p>
If I use .Pattern = "<img\s*src=""([^""]*)"""
it will match: <img scr
but not: <img width
RCGUA123
01-08-2014, 08:27 AM
I figured it out. The RegEx pattern that worked is below:
.Pattern = "<img(.*?)>"
macropod
01-08-2014, 08:51 PM
In Word, you can do this without recourse to RegEx, using a wildcard Find:
.Text = \<img*\>
RCGUA123
01-20-2014, 06:13 AM
Thanks, I appreciate the help. I decided to use Regular Expressions instead of Word because I am searching through large html files with lots of lines of html code and I thought using RegEx would be faster than opening the file and searching in Word.
lecxe
01-27-2014, 02:29 PM
Hi RCGUA
For the examples you post, I think a pattern similar to the one you posted in post #1 would be preferable. You'd get the name of the file directly.
.Pattern = "<img\s.*?src=""([^""]+)"""
Also make the matching case insensitive.
RCGUA123
01-28-2014, 06:19 AM
Thanks, I think the first pattern worked for image names that begin with <img src
but I think it didn't work for image names like: <img width="100%" src="orange.jpg">
lecxe
01-28-2014, 07:30 AM
Hi
I tried with your example and it works for me.
Please try:
Sub Test()
Dim s As String
s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
With CreateObject("VBScript.RegExp")
.pattern = "<img\s.*?src=""([^""]+)"""
MsgBox .Execute(s)(0).submatches(0)
End With
End Sub
RCGUA123
01-29-2014, 06:16 AM
Thanks lecxe, I must have missed the word "similar" in your first post and I didn't notice the slight difference. Why do you think: "<img\s.*?src=""([^""]+)"""
is preferable to: "<img(.*?)>"
I will change my code based on your answer. Will it run faster, or get a variety of image names? I don't know reg ex so I appreciate your help and advice.
lecxe
01-29-2014, 07:44 AM
Why do you think: "<img\s.*?src=""([^""]+)""" is preferable to: "<img(.*?)>"
I'm not sure I understood exactly what you want, so I'm writing what I understood.
In the text:
XXX<img width="100%" src="orange.jpg">XXX
What do you want to get?
What I understood is that you want to get
orange.jpg
Is it true? If not please post what you'd like to get from that text.
This is why I used the pattern
"<img\s.*?src=""([^""]+)"""
because, as you can see running the code I posted, you get directly "orange.jpg" out of the text.
Please clarify.
RCGUA123
01-30-2014, 01:38 PM
Yes, you are correct I want to get: orange.jpg as I understand it, both patterns will get orange.jpg and both patterns will run at the same speed, but, I may be wrong. My question was if both patterns do the same thing, is one preferable? For example, does one run faster than the other?
lecxe
01-31-2014, 02:31 AM
... Yes, you are correct I want to get: orange.jpg as I understand it, both patterns will get orange.jpg and both patterns will run at the same speed ...
I don't understand how you came to this conclusion.
Using as an example the string in post #14 with an image tag:
s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
As I see it, the first pattern
.Pattern = "<img\s.*?src=""([^""]+)"""
Get's you "orange.pjg" directly (just execute the code in post #14 to confirm it)
Now your suggestion for the pattern
.Pattern = "<img(.*?)>"
If you use it with the same string, and get both the match and the submatch:
Sub Test()
Dim s As String
s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
With CreateObject("VBScript.RegExp")
.pattern = "<img(.*?)>"
MsgBox "Match: " & .Execute(s)(0)
MsgBox "Submatch: " & .Execute(s)(0).submatches(0)
End With
End Sub
The result is:
Match: <img width="100%" src="orange.jpg">
Submatch: width="100%" src="orange.jpg"
None of the two gives you "orange.jpg" directly.
I don't understand. Can you clarify?
Please post the code you used to test.
RCGUA123
01-31-2014, 07:31 AM
My mistake I apologize, I just looked at the code and I realize that I hacked something together. Not knowing regex I managed to get most of the image name and then I ran the string through some things to extract the bare image name. Thank you for being persistent, I will change my code, your pattern match will work much better. Thanks again.
lecxe
01-31-2014, 07:59 AM
I'm glad I was able to help. Thanks for the feedback.
Powered by vBulletin® Version 4.2.5 Copyright © 2024 vBulletin Solutions Inc. All rights reserved.