PDA

View Full Version : [SOLVED:] VBA Word regular expression pattern match



RCGUA123
01-07-2014, 02:58 PM
Hello, I am trying to find the correct Regular Expression pattern to match html strings that contain image names. The image names always start with <img and end with a close angle bracket. The image names always end with .gif or .jpg I only need to extract the actual image name. Some image name examples are:
<img width="100%" src="orange.jpg">
<img src="orange.jpg">
text here</font></p> <IMG SCR="example_logo.gif">
text <img src="example_logo.gif">text here</font></p>

If I use .Pattern = "<img\s*src=""([^""]*)"""
it will match: <img scr
but not: <img width

RCGUA123
01-08-2014, 08:27 AM
I figured it out. The RegEx pattern that worked is below:

.Pattern = "<img(.*?)>"

macropod
01-08-2014, 08:51 PM
In Word, you can do this without recourse to RegEx, using a wildcard Find:
.Text = \<img*\>

RCGUA123
01-20-2014, 06:13 AM
Thanks, I appreciate the help. I decided to use Regular Expressions instead of Word because I am searching through large html files with lots of lines of html code and I thought using RegEx would be faster than opening the file and searching in Word.

lecxe
01-27-2014, 02:29 PM
Hi RCGUA

For the examples you post, I think a pattern similar to the one you posted in post #1 would be preferable. You'd get the name of the file directly.


.Pattern = "<img\s.*?src=""([^""]+)"""

Also make the matching case insensitive.

RCGUA123
01-28-2014, 06:19 AM
Thanks, I think the first pattern worked for image names that begin with <img src
but I think it didn't work for image names like: <img width="100%" src="orange.jpg">

lecxe
01-28-2014, 07:30 AM
Hi

I tried with your example and it works for me.

Please try:




Sub Test()
Dim s As String

s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
With CreateObject("VBScript.RegExp")
.pattern = "<img\s.*?src=""([^""]+)"""
MsgBox .Execute(s)(0).submatches(0)
End With
End Sub

RCGUA123
01-29-2014, 06:16 AM
Thanks lecxe, I must have missed the word "similar" in your first post and I didn't notice the slight difference. Why do you think: "<img\s.*?src=""([^""]+)"""
is preferable to: "<img(.*?)>"

I will change my code based on your answer. Will it run faster, or get a variety of image names? I don't know reg ex so I appreciate your help and advice.

lecxe
01-29-2014, 07:44 AM
Why do you think: "<img\s.*?src=""([^""]+)""" is preferable to: "<img(.*?)>"

I'm not sure I understood exactly what you want, so I'm writing what I understood.

In the text:


XXX<img width="100%" src="orange.jpg">XXX

What do you want to get?

What I understood is that you want to get


orange.jpg

Is it true? If not please post what you'd like to get from that text.

This is why I used the pattern

"<img\s.*?src=""([^""]+)"""

because, as you can see running the code I posted, you get directly "orange.jpg" out of the text.

Please clarify.

RCGUA123
01-30-2014, 01:38 PM
Yes, you are correct I want to get: orange.jpg as I understand it, both patterns will get orange.jpg and both patterns will run at the same speed, but, I may be wrong. My question was if both patterns do the same thing, is one preferable? For example, does one run faster than the other?

lecxe
01-31-2014, 02:31 AM
... Yes, you are correct I want to get: orange.jpg as I understand it, both patterns will get orange.jpg and both patterns will run at the same speed ...

I don't understand how you came to this conclusion.

Using as an example the string in post #14 with an image tag:


s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"

As I see it, the first pattern


.Pattern = "<img\s.*?src=""([^""]+)"""

Get's you "orange.pjg" directly (just execute the code in post #14 to confirm it)

Now your suggestion for the pattern


.Pattern = "<img(.*?)>"

If you use it with the same string, and get both the match and the submatch:



Sub Test()
Dim s As String

s = "<b>XXX</b><img width=""100%"" src=""orange.jpg"">"
With CreateObject("VBScript.RegExp")
.pattern = "<img(.*?)>"
MsgBox "Match: " & .Execute(s)(0)
MsgBox "Submatch: " & .Execute(s)(0).submatches(0)
End With
End Sub


The result is:

Match: <img width="100%" src="orange.jpg">
Submatch: width="100%" src="orange.jpg"

None of the two gives you "orange.jpg" directly.

I don't understand. Can you clarify?

Please post the code you used to test.

RCGUA123
01-31-2014, 07:31 AM
My mistake I apologize, I just looked at the code and I realize that I hacked something together. Not knowing regex I managed to get most of the image name and then I ran the string through some things to extract the bare image name. Thank you for being persistent, I will change my code, your pattern match will work much better. Thanks again.

lecxe
01-31-2014, 07:59 AM
I'm glad I was able to help. Thanks for the feedback.