PDA

View Full Version : Multiplying all numbers in a document with 2



ontherun
04-25-2010, 01:01 PM
Hey!

I'm very much a beginner of VBA and I've been trying to find a good way to automatically search through my text and replace all numbers with the same value but multiplied by 2.

All of my numbers have a special character, >, in front of them so for example >143. The document also consists of a combination of written text and these numbers. So an example text would be

"My length is not >500 feet"

Which I want converted to

"My length is not >1000 feet"

I'd really appreciate if anyone could point me in the right direction!

Thanks!
Chris

mdmackillop
04-25-2010, 03:18 PM
Welcome to VBAX

Give this a try


Option Explicit
Sub test()
Dim C As Range, Match, Matches
Dim RegExp As Object
Dim rng As Range, RetStr As String
Set RegExp = CreateObject("VBScript.RegExp")
With RegExp
.Global = True
.IgnoreCase = True
End With
RegExp.Pattern = ">\d*\b"
Set Matches = RegExp.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
DoDouble Match
Next
With Selection.Find
.Text = ">#"
.Replacement.Text = ">"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Sub DoDouble(txt)
Dim Rep As String
Rep = Left(txt, 1) & "#" & 2 * Right(txt, Len(txt) - 1)
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = txt
.Replacement.Text = Rep
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

fumei
04-26-2010, 11:39 AM
Or.................
Option Explicit

Sub ByTwo()
Dim r As Range
Set r = ActiveDocument.Range
Dim bolNotNum As Boolean
bolNotNum = True
With r.Find
Do While .Execute(FindText:=">", Forward:=True) = True
With r
Do Until bolNotNum = False
.MoveEnd Unit:=wdCharacter, Count:=1
bolNotNum = IsNumeric(Right(r.Text, 1))
Loop
.MoveEnd Unit:=wdCharacter, Count:=-1
.MoveStart Unit:=wdCharacter, Count:=1
.Text = CStr(CLng(r.Text) * 2)
.Collapse 0
End With
bolNotNum = True
Loop
End With
End Sub
If you could gurantee there would never be any instances of ">xxx" followed by anything other than a space - say if it was followed by a period - then the code could be even simpler. So if the ">xxx" was always followed by a space, this would work.
Sub ByTwoSimple()
Dim r As Range
Set r = ActiveDocument.Range
With r.Find
Do While .Execute(FindText:=">", Forward:=True) = True
With r
.MoveEndUntil Cset:=" "
.MoveStart Unit:=wdCharacter, Count:=1
.Text = CStr(CLng(r.Text) * 2)
.Collapse 0
End With
Loop
End With
End Sub
Obviously though, that is doubtful, so the longer version (that does a numeric test on the last character of the expanding range) is needed. It keeps expanding the range until the last character is NOT numeric. Thus, spaces, semi-colons, periods...whatever...are covered.

demo attached. Click "By Two" on the top toolbar. Note the last instance of ">34" is followed by a period, not a space.

NOTE!!!! This only looks for ">xxx" - i.e there is NO space between the ">" and the number. If there is a space - "> xxx" - it will not work. It could be made to work, but would take a bit more effort.

ontherun
04-27-2010, 09:42 AM
Thanks a ton guys for taking time to write all that up! Really appreciate it!

It works like a charm :)
You really realize how limited you are when not using VBA.

I've noticed VBA isn't too efficient for exremely large documents, so might need to create an app for larger files.

Thanks again!

fumei
04-27-2010, 09:50 AM
"I've noticed VBA isn't too efficient for exremely large documents,"

That depends. If you have found that, the first thing I would ask is: are you using Selection in your code?

If you are, THAT is the greatest improvement you can make. Do not use Selection.

mdmackillop
04-27-2010, 03:59 PM
You have two different approaches here. Which one are you going forward with? No problems either way, but only you have the "real life" data.

fumei
04-28-2010, 10:36 AM
Indeed, quite different.

Just for curiousity sake, I created a 150 page document with many occurences of the ">xxx".

4848 occurences to be precise (I used a counter to count how many times it actioned a Found ">").

Using my code. I set a Start variable (at the start of actioning), and a Finish variable (at the end of actioning), and a j variable as a counter. So...
Start = Format(Now, "hh:mm:ss")
With r.Find
' the actioning...
' yadda yadda


Finish = Format(Now, "hh:mm:ss")
MsgBox "Start: " & Start & vbCrLf & vbCrLf & "End: " & _
Finish & vbCrLf & vbCrLf & "Count = " & j


OK? Here is the result.

Start: 9:52:06 AM

End: 9:52:15 AM

Count = 4848

So (and of course using Now is not particularly fine-tuned timing...), but approximately 9 seconds, to change 4848 items over 150 pages.

If I add the same variables (Start and Finish) to Malcolm's code, and execute it on the same document, here is the result.

Start: 10:01:31 AM

End: 10:29:12 AM

Count = 4848

Same number (4848)...but...WHOA! The time to action is, shall we say, significantly different. Remember this is on the exact same document.

9 seconds vs. 1661 seconds (27 minutes, 41 seconds)

These are real numbers. I have been sitting here waiting and waiting and waiting for Malcolm's code to finish.

Why did it take so long?

Simple.

1. the use of Selection
2. the use of replacing text, and then going back to remove that text

It adds a "#" to every occurence, and then goes back and removes the "#". This is hugely inefficient across a large document.

So...to reiterate: "I've noticed VBA isn't too efficient for exremely large documents"

No, it depends.

mdmackillop
04-28-2010, 12:26 PM
9 seconds vs. 1661 seconds (27 minutes, 41 seconds)

Time enough for two cups of tea and a cream bun! I should stick to Excel.

fumei
04-29-2010, 11:59 AM
Or have plenty of time to visit the loo.

Nah, it is not really an Excel vesus Word issue. I am not sure if it would not take a significant amount of time in Excel if it was doing the same thing.

Selecting each cell
Changing a bit of the text of the cell
Looking for the next cell
Selecting that cell
Changing a bit of the text in the cell

......

start at the beginning again
Select each cell
Change the text
Looking for the next cell
Selecting that cell
Change the text

....

start at the beginning again
Select each cell
Change the text by removing the first change
Looking for the next cell
Selecting that cell
Change the text removing the first change

As you can see, when you spell it out logically the actions are...ummmmm, very tedious and repeating, AND there is a hell of a lot of selecting going on.

Could it be done more efficient in Excel? Heck I don't know. I am sure not any expert on Excel. Probably, by more efficient use of actions. That is the bottom line for ANY use of VBA, regardless of the application.

fumei
04-29-2010, 12:04 PM
"Or have plenty of time to visit the loo."


NOT that I am suggesting your code is crap. Although...I did become a little :old: while I was waiting.

:beerchug:

fumei
04-29-2010, 01:23 PM
Finally, if you DO want to use Malcom's RegEx route, it can be made faster. Here is the result I just got (same 150 page document) with modifiying Malcolm's code to not do that "#" replacement thing, and using range rather than Selection.

Start: 1:01:19 PM

End: 1:05:37 PM

Count = 4848

So, from 27 minutes, 41 seconds down to 4 minutes and 18 seconds.

I would say that is much better...although still much slower than 9 seconds.

Which reiterates my contention: it depends.

"I've noticed VBA isn't too efficient for exremely large documents"

It depends.

fumei
04-29-2010, 01:34 PM
I suppose for interest sake, you (Malcolm) may want to see what I changed in your code. I know I would...

So, here ya go.
Sub test()
Dim Match, Matches
Dim RegExp As Object
Dim Start As Date
Dim Finish As Date
Dim j As Long

Set RegExp = CreateObject("VBScript.RegExp")
With RegExp
.Global = True
.IgnoreCase = True
End With
RegExp.Pattern = ">\d*\b"
Set Matches = RegExp.Execute(ActiveDocument.Range.Text)

Start = Format(Now, "hh:mm:ss")
For Each Match In Matches
DoDouble Match
j = j + 1
Next
Finish = Format(Now, "hh:mm:ss")
Selection.EndKey Unit:=wdStory

Selection.TypeText "Start: " & Start & vbCrLf & vbCrLf & "End: " & _
Finish & vbCrLf & vbCrLf & "Count = " & j
End Sub

Sub DoDouble(txt)
Dim r As Range
Dim Rep As String
Set r = ActiveDocument.Range
Rep = Left(txt, 1) & CStr((2 * CInt(Right(txt, Len(txt) - 1))))
With r.Find
.Text = txt
.Replacement.Text = Rep
.Forward = True
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceAll
End With
End Sub
Notice the change between your
Rep = Left(txt, 1) & "#" & 2 * Right(txt, Len(txt) - 1)
and my

Rep = Left(txt, 1) & CStr((2 * CInt(Right(txt, Len(txt) - 1))))
I never did see why you added that "#" thing.

Plus, it uses Range.

However, the reason it is still significantly slower is:

1. in the procedure "test", the instruction
Set Matches = RegExp.Execute(ActiveDocument.Range.Text)
causes VBA to parse through the ENTIRE document. This takes time.

2. then EACH item of Matches - Match, eg. >4000 - is routed to the Subroutine DoMatch; and every time a Range is created for the whole document, AND, the ENTIRE document is parsed.....again.

So say you only had four (4) unique items of Match - >4000, >3500, > 372, >9000 - the actioning still requires FIVE full and complete parsing of the entire document...which takes time.

My code starts at the beginning of the document and works its way dynamcially until the end. In fact, logically speaking, the entire document is never parsed in full. Nor...is there need to do so.

mdmackillop
04-29-2010, 03:27 PM
Hi Gerry,
I'll study this at the weekend. I'm just looking into RegExp and a whole lot to learn.. I had hoped the Replace method would produce a solution, still looking!
Regards
Malcolm

fumei
04-30-2010, 01:33 PM
Actually...yes, Replace could probably work.

Hmmm.

mdmackillop
05-04-2010, 05:17 AM
Hi Gerry,
I couldn't figure out the Replace, but thought this should be quicker if there are a limited number of unique values to be replaced. If every number is different, then it won't be any better. Given the complication of the code, and your 9 second time, I can't see RegExp offering a great advantage.
The parsing seems to be very quick, it is the Replace which takes the time.


Option Explicit
Sub test()
Dim Match, Matches
Dim RegExp As Object
Dim Time1 As Date
Dim Time2 As Date
Dim Time3 As Date
Dim i As Long
Dim d, a
Dim txt As String, Rep As String
Dim rng As Range

Set d = CreateObject("Scripting.Dictionary")

Time1 = Now
Set RegExp = CreateObject("VBScript.RegExp")
With RegExp
.Global = True
.IgnoreCase = True
End With
RegExp.Pattern = ">\d*\b"
Set Matches = RegExp.Execute(ActiveDocument.Range.Text)

'Get unique values
On Error Resume Next
For Each Match In Matches
d.Add CStr(Match), CStr(Match)
Next
On Error GoTo 0
a = d.items
'Order values
a = DoSort(a)
Set rng = ActiveDocument.Range
rng.Find.ClearFormatting
rng.Find.Replacement.ClearFormatting

Time2 = Now
'Process in descending order to prevent multiple changes of an item i.e. 1 to 2 to 4 etc.
For i = d.Count - 1 To 0 Step -1
txt = a(i)
Rep = ">" & 2 * Right(txt, Len(txt) - 1)
With rng.Find
.Text = txt
.Replacement.Text = Rep
.Forward = True
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceAll
End With
Next

Time3 = Now

Selection.HomeKey Unit:=wdStory
Selection.TypeText Time1 & vbCrLf & Time2 & vbCrLf & Time3 & "Count = " & _
Matches.Count & vbCrLf & vbCrLf

End Sub

Function DoSort(TempArray As Variant) As Variant
Dim MaxVal As Variant
Dim MaxIndex As Integer
Dim i, j As Integer

For i = UBound(TempArray) To 1 Step -1
MaxVal = TempArray(i)
MaxIndex = i
For j = 1 To i
If TempArray(j) > MaxVal Then
MaxVal = TempArray(j)
MaxIndex = j
End If
Next j
If MaxIndex < i Then
TempArray(MaxIndex) = TempArray(i)
TempArray(i) = MaxVal
End If
Next i
DoSort = TempArray
End Function

fumei
05-04-2010, 09:09 AM
"The parsing seems to be very quick, it is the Replace which takes the time. "

Interesting.