PDA

View Full Version : Strange behaviour of Word VBA



smallken
05-06-2016, 04:32 AM
I have a 480 page document in Word 2010 containing 20,000 paragraphs. If I run the following VBA statements then the execution time varies depending on the current page displayed:

tm = Timer
For Each aParagraph In ActiveDocument.Paragraphs
Set aStyle = aParagraph.Style
Next aParagraph
msgbox "Execution time=" & Timer – tm

For example if the first page is displayed then the execution time is 21 seconds. Scrolling down to display the Table of Contents and re-running gives an execution time of 30 minutes. Scrolling down to the display the last page gives an execution time of 7 minutes.

A different 186 page document gives times of 2 secs, 266 secs and 5 secs

Therefore, there is a 100 to 1 difference in the execution time depending on what page happens to be displayed on the screen.

Does anyone have an idea as to why what is displayed on the screen affects execution time?

Don't even think of using the following. You will be waiting for a very long time.
For k = 1 To ActiveDocument.Paragraphs.Count
Set aStyle = ActiveDocument.Paragraphs(k).Style
Next k

gmaxey
05-06-2016, 08:57 AM
I've seen issues like this before but not to that degree. I don't have document with that many paragraphs but I do have one 982 pages long with over 6,000 paragraphs.

I ran this code in Word 2010 in 2.87 seconds regardless if was on the first page, last page or in the middle. It does have a TOC:


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim tm
Dim aParagraph, aStyle
Dim i As Long
tm = Timer
For Each aParagraph In ActiveDocument.Paragraphs
Set aStyle = aParagraph.Style
i = i + 1
Debug.Print "A"
Next aParagraph
MsgBox Timer - tm & " " & i & " paragraphs"
lbl_Exit:
Exit Sub
End Sub

smallken
05-06-2016, 12:32 PM
Both of the documents were created in Word 2003 (from different sources) but the TOC was added to the documents in 2010, without the TOC there is no problem. The effect only shows up if the TOC is actually displayed on the screen.

I have tried converting the document to .docx both by saving as .docx and by using the conversion option and the issue still occurs.

Just to see what would happen, I appended the larger document to itself to make a 960 page document and the code took 50 minutes if the TOC was displayed and 42 seconds if page 1 was displayed.

gmaxey
05-06-2016, 12:40 PM
Have you tried removing the TOC, converting the document and recreating the TOC?

smallken
05-06-2016, 12:48 PM
Yes, I have tried that. What I have not tried yet is copying the text into a blank W2010 document. It might not be conclusive because the document contains a lot of images, tables and cross references that are not easy to copy across.

Frosty
05-06-2016, 01:06 PM
Have you tried using Application.ScreenUpdating = False during the processing of you code? You could also try turning off auto-pagination and/or switching to Normal view.

There are differences in how different loop types are processed, and For Each loops tend to iterate faster than For loops. There's also a way third way which sometimes is even faster, but comes with some danger of infinite loops...


set myPara = Selection.Paragraphs.First
Do
set myPara.Range.Style = myPara.Range.Style
Set myPara = myPara.Next
Loop Until myPara is Nothing


Basically - you can iterate through a loop by using the .Next property of certain objects (including Paragraphs).

Last thought: not to state the obvious -- but maybe it would simply be best, at the beginning of whatever your real processing is, to simply save where your cursor is, move it to the front of the document, then process, then move your cursor back to where it was when you started.

You can spend a lot of time figuring out why Word is choosing to refresh itself despite potential instructions not to... but I'll guess it's either A) not worth the time or B) not replicable in multiple environments or C) all of the above.

smallken
05-06-2016, 01:45 PM
I have been changing the display to the first page and returning afterwards as a solution which works provided the TOC is not on the first page. I put up this post in order to satisfy my curiosity as to what was happening and try and find a more elegant solution.

Turning off the screen updating approximately halves the processing time but it is still excessive.

Your suggested code using Set myPara = myPara.Next takes the same time as the For Each loop.

Frosty
05-06-2016, 01:58 PM
Well, the real reason is probably buried in something very specific to your environment or something specific to the document(s).

I'm guessing since you're operating this on a bunch of documents, that it's a particular class of objects which may or may not have a TOC on the first page?

I'd probably add some code which creates a blank page at the beginning of every document you work on, and just move your cursor there.

You can also switch to Normal view and have your zoom set to 10%-- that has sped up processing for me in the past.

Other than that, I suspect there won't be a real "elegant" solution for you. You just want to try to remove the reasons Word might have to repaginate the document during the processing. Have you tried hiding the entire application or the window during processing?

smallken
05-06-2016, 05:43 PM
I have tried minimising the window during processing and the timing is the same.

It looks like another foible of Word that I will have to live with.

smallken
05-06-2016, 10:19 PM
I dug out an old laptop running Word 2007 on Vista and it would not replicate the problem. My desktop is running Word 2010 on Windows 10. Comparing the Word Options between 2007 and 2010 shows no obvious difference in settings.

A clue is that Application.ScreenUpdating = False halves the time taken. Running in Draft view does not change the times so it would appear that pagination updating is not the issue. However, when the TOC is in view the individual TOC entries flash during processing (even in Draft view). It is as if the TOC entries are being updated even when the screen updating is turned off. But why would they be updated anyway, all I am doing is reading the style setting and not changing anything.

gmayor
05-07-2016, 02:54 AM
Just to muddy the waters I created a document with 20,000 paragraphs (and no TOC).

With DoEvents in place of Debug.Print "A" in Greg's code, the document took 25 second to process in its original Word 2003 format. Saved as DOCX in Word 2016, it took 75 seconds :banghead:

Without DoEvents, the process appeared to hang, but nevertheless completed with similar results.

Turning off screen updating had minimal effect.

Using Word 2010 on the same PC, the original Word 2003 document took over 5 minutes to process and the DOCX version took about 20 seconds less.:snooze

smallken
05-07-2016, 03:35 AM
25 seconds in 2003 VBA and 75 seconds in 2016 VBA is par for the course. When I changed over from 2003 to 2007 I found that VBA ran between 5 and 20 times slower depending on the particular statements. 2010 VBA is slightly faster than 2007 and it looks like 2016 is about the same. I had some Excel VBA routines that read data in real time from an analog to digital converter running at 200 samples per second. They worked very well in 2003 and were able to plot data graphs in real time but 2007/2010 was too slow and they became unusable even on a faster machine.

Your result in 2010 is interesting in that you have no TOC. I would have expected a time about what you got with 2016.

gmayor
05-07-2016, 04:53 AM
I should have clarified it was Word 2016 with the DOC format document that took 25 seconds. I have just run it again in Word 2003 vba on the same PC and it took 2.6 seconds.
When I opened the DOCX file in Word 2003 (using the compatibility pack) it ran in just over 3 seconds.

smallken
05-07-2016, 05:08 AM
Would you try in 2016 generating a decent size TOC, displaying the TOC on the screen and running again.

If it comes in under 40 seconds I might be tempted to buy 2016.

Frosty
05-07-2016, 07:01 AM
What is the processing time if you reveal the field codes before running? If not pagination but the TOC is trying to refresh itself -- then your elegant solution might just be showing field codes before you run it.

smallken
05-07-2016, 10:42 AM
Tried revealing field codes with the TOC field code on the screen. Makes no difference to the times.

gmaxey
05-07-2016, 11:01 AM
I don't thing it is Word 2010 in particular or that upgrading to Word 2016 will resolve your issue.

I took that 900 plus page file created in Word 2010 and copied and pasted text until I had 20K plus paragraphs. The code ran in under 10 seconds. I then saved it with a .doc extension, closed, closed Word. Open Word 2003, opened the file and the code ran in 76 seconds.

I opened it as a doc file in Word2010 and it ran in 11.5 seconds. I converted the file and saved as a .docm and the code ran in under 10 seconds.

I always use Print Preview and before running any code I make sure that the document is fully opened and paginated (3000 plus pages). Opening an repaginting in Word 2010 take far longer than running the code.

I then attempted to open in Word 2016 and after waiting nearly 10 minutes for the document to paginate. I quit.

That is just one reason why I use Word 2010 almost exclusively.

Frosty
05-07-2016, 11:29 AM
Microsoft changed the engine that displays the documents a lot in 2013/2016. The long list of options you get for compatibility options on a document (the stuff that says "don't expand line after shift-return" settings) is quite a bit shorter. It may be that one of those settings is impacting the performance. You could explore what those settings are (try turning them all off, etc). After that, I'm out of ideas.

smallken
05-07-2016, 05:34 PM
You have saved me the cost of buying 2016. The expression dumbing down comes to mind.

gmaxey
05-07-2016, 05:44 PM
Other that for testing, the rare occasion where I want to map a rich text CC is the only reason I use Word 2013/2016

smallken
05-08-2016, 05:43 AM
Some of the problems seem to be due to the complexity of the document. My 480 page, 20000 paragraph test document had over 4000 bookmarks most of which were dormant, and 700 REF fields. Converting the fields to text and removing unused bookmarks gave a big improvement to the timing. With the TOC visible the timing came down from 30 minutes to 2 minutes. If the TOC is not visible then the timing came down from 80 to 28 seconds.

Therefore:

The timing is dependent of a whole heap of factors, not just the number of paragraphs. The visibility of the TOC is just one of them.

For large documents, if the screen is displaying a TOC it is well worth jumping to another page before executing the For Each… code.

smallken
05-10-2016, 09:57 PM
More curious timings.


Sub timings()
Dim tm As Single
tm = Timer
MsgBox ActiveDocument.Paragraphs.Count & " time=" & Timer - tm
End Sub

20 seconds on 480 page test doc
1.9 seconds on same document with REF fields converted to text


Sub timings()
Dim tm As Single
tm = Timer
MsgBox ActiveDocument.Sentences.Count & " time=" & Timer - tm
End Sub

1.7 seconds on test doc
3.3 seconds on same document with REF fields converted to text