PDA

View Full Version : [SOLVED:] Dealing with insufficient memory on large documents



mrblack
07-07-2022, 08:20 AM
Hi,
I am facing a challenge which I would like to share to see if someone has any experiences with this particular issue. This is the case: a very large legal document – large meaning 500+ pages and over 200.000 words – with an intricate structure of elements such as chapters, paragraphs, articles, sections, subsections and numerous bullet lists. Some chapters containing up to 1200 seperate articles, some sections with bullet lists that are three sub levels deep.

The input document is a document where each element has its own style, e.g. “Chapter”, "Paragraph" or “Article”, even the bullet lists which can have a number, letter or roman numeral as bullet, e.g. “1.”, “a.” or “i.”. I created VBA code that searches each identifier and, when found, applies the right style to it. This all goes well until the last part where the styles for sections and bullet lists have to be reapplied (because for some reason the numbering for these elements continues across the chapters). Moreover, its just the first section of an article and the first item of a bulletlist within this section that need to be restyled.

I have written a subroutine to do this but because of the sheer size of the document, the macro runs out of memory at some point in this subroutine. I have already tried the usual solutions for keeping memory usage to a minimum, like using Ranges instead of Selections, no global variables and clearing the Undo memory after each loop. Unfortunately, this does not solve the problem.

I have created a work-around in which the “out of memory” error is caught with an error handler. The general idea is to store the number of the last element that was properly processed – i.e. before the error was thrown –in a variable, so the process can be interrupted and restarted again from this point on. I first tried to automate this interruption and restarting by saving, closing and reopening the document and then do a recursive call to let the subroutine start itself. It turns out, however, that this does not clear the memory. Apparently, the only way to do this is to close the entire application. So with this in mind I now created the following: the error handler tells the user which last document element was processed properly, saves the document and quits the application.

The user then has to restart Word, reopen the document and rerun the macro, this time telling it to start not from the beginning of the document but from the point where it had left of, i.e. the aforementioned last document element.

This is however a cumbersome, error-prone solution and asking a fair amount of attention of the user, while this need for user intervention was the very thing I had developed the macro for in the first place. I hope some of you have run into a likewise situation and can offer me some advice on how to overcome this problem.

The VBA code is as follows:

Sub Applystyles()

Dim markup As Style
Dim applystyles_section, applystyles_sub1, first_section, first_sub1 As Boolean
Dim myRange As Range
Dim starting_point, lastelementprocessed As String

Application.ScreenUpdating = False

Set myRange = ActiveDocument.Range

With ListGalleries(wdOutlineNumberGallery).ListTemplates(1).ListLevels(1)
.NumberFormat = "%1."
.NumberStyle = wdListNumberStyleArabic
.LinkedStyle = "Section"
.StartAt = 1

End With
With ListGalleries(wdOutlineNumberGallery).ListTemplates(1).ListLevels(2)
.NumberFormat = "%2."
.NumberStyle = wdListNumberStyleLowercaseLetter
.LinkedStyle = "Sub section a"
.StartAt = 1

End With
With ListGalleries(wdOutlineNumberGallery).ListTemplates(1).ListLevels(3)
.NumberFormat = "%3°."
.NumberStyle = wdListNumberStyleArabic
.LinkedStyle = "Subsub section 1°"
.StartAt = 1

End With
With ListGalleries(wdOutlineNumberGallery).ListTemplates(1).ListLevels(4)
.NumberFormat = "%4."
.NumberStyle = wdListNumberStyleLowercaseRoman
.LinkedStyle = "Subsubsub section i"
.StartAt = 1

End With

applystyles_section = True
applystyles_sub1 = True
first_section = True
first_sub1= True
starting_point = ""
lastelementprocessed= ""

If MsgBox(“First run?”, vbYesNo) = vbNo Then
starting_point = "Article " & InputBox("Input number of starting article or leave blank if process needs to start at beginning of document")

End If

With myRange
If starting_point = "" Then


.Start = ActiveDocument.Range.Start
.Collapse Direction:=wdCollapseStart
.Move Unit:=wdParagraph
.Expand Unit:=wdParagraph


Else


With .Find


.Text = starting_point
.Style = ActiveDocument.Styles("article_heading")
.Execute

If Not (.Found = True) Then


MsgBox ("Article not found. Macro will shutdown.")
Exit Sub


End If


End With


End If
.Select
Set markup = .Style

End With


On Error GoTo ErrHandler

Do Until (myRange.End = ActiveDocument.Range.End)
Select Case markup

Case ActiveDocument.Styles("article_heading")



applystyles_section = True
applystyles_sub1 = True
first_section = True
first_sub1= True



Case ActiveDocument.Styles("section_heading")



If first_section = True And applystyles_section = True Then



myRange.ListFormat.ApplyListTemplateWithLevel ListTemplate:=ListGalleries(wdOutlineNumberGallery).ListTemplates(1), ContinuePreviousList:=False, ApplyTo:=wdListApplyToWholeList, DefaultListBehavior:=wdWord10ListBehavior

applystyles_section = False
first_section = False


End If



Case ActiveDocument.Styles("sub section a")



If first_sub1 = True And applystyles_sub1 = True And first_section = True Then



myRange.ListFormat.ApplyListTemplateWithLevel ListTemplate:=ListGalleries(wdOutlineNumberGallery).ListTemplates(1), ContinuePreviousList:=False, ApplyTo:=wdListApplyToWholeList, DefaultListBehavior:=wdWord10ListBehavior

applystyles_sub1 = False
first_sub1= False


End If


End Select

With myRange


.Move Unit:=wdParagraph, Count:=1
.Expand Unit:=wdParagraph
.Select
Set markup = .Style

If .Style = ActiveDocument.Styles("Article_heading") Then


lastelementprocessed= .Text


End If


End With

ActiveDocument.UndoClear

Loop

MsgBox ("The macro is done.")
Exit Sub

ErrHandler:
MsgBox ("Error: " & Err.Description)
MsgBox ("The last element that was processed correctly is: " & lastelementprocessed & ". Write this down!")

Set myRange = Nothing
Application.Quit SaveChanges:=wdPromptToSaveChanges

End Sub

macropod
07-08-2022, 02:39 AM
Maybe I'm missing something, but it seems to me that all you need is:

Sub ApplyStyles()
Application.ScreenUpdating = False
Dim LT As ListTemplate, i As Long
Set LT = ActiveDocument.ListTemplates.Add(OutlineNumbered:=True)
For i = 1 To 4
With LT.ListLevels(i)
.NumberFormat = Choose(i, "%1.", "%2.", "%3°.", "%4.")
.TrailingCharacter = wdTrailingTab
.NumberStyle = Choose(i, wdListNumberStyleArabic, wdListNumberStyleLowercaseLetter, _
wdListNumberStyleArabic, wdListNumberStyleLowercaseRoman)
.LinkedStyle = Choose(i, "Section", "Sub section a", "Subsub section 1°", "Subsubsub section i")
.ResetOnHigher = True
.StartAt = 1
End With
Next
Application.ScreenUpdating = True
End Sub
That said, I also note that the Style names in your Select Case statement mostly don't match the ones you're applying the numbering to...

mrblack
07-09-2022, 01:32 AM
Macropod,
thanx for your speedy response and this seemingly straightforward solution. Unfortunaley, is doesn't do the job. I have cut the following snippet from the resulted output to clarify:

<snippet>
Article 1.1a. (base)
1. This article ….
2. This article also refers to ….:
a. blablabla;
b. blablabla; and
c. blablabla.
Artikel 1.2. (xxx)
This article also applies ….
Artikel 1.3. (xxx)
1. It is restricted…:
a. blablabla;
b. blablabla; and
c. blablabla.
3. blablabla.
Chapter 2. Title Article 2.1. (xxx)
1. blablabla.
4. blablabla:
a. blablabla;
b. blablabla;
c. blablabla; and
d. blablabla.
Article 2.1a. (xxx)
This article ….:
a. blablabla; and
e. blablabla

</snippet>

Now, the styles are applied correctly but there is still something off with the numbering. For instance, the 2nd section of article 1.3 is numbered as "3." and the 2nd section of article 2.1 as "4.". Furthermore, the 2nd bulletlist item in article 2.1a is numbered as "e.".So, something is preventing the code from correctly continuing the numbering of a list.

mrblack
07-09-2022, 01:34 AM
Oh, and the mismatch you are referrring to is a result of translating some variable names from Dutch into Engilsh....:thumb

macropod
07-09-2022, 01:50 AM
The erroneous result is most likely because you're not using the Styles referred to in the macro for the article sections or, alternatively, for the article headings. Done properly, the numbering for for the article sections will re-set automatically after every article heading - just as do the article sub-sections. But, since you don't say anywhere what Style names (untranslated) you're using at each level, it's impossible to provide further advice at this point.

The easiest way of resolving the matter would be for you to attach an actual document containing some representative content (even empty paragraphs) containing the Chapter, Article Title, Article section, Article sub-section, etc. that you're actually working with. You can do this via the paperclip symbol on the 'Go Advanced' tab at the bottom of this screen.

mrblack
07-10-2022, 06:03 AM
I have attached a small example document.

macropod
07-10-2022, 08:11 PM
The fundamental problem is that your headings Article 7.61d. and the like are all formatted with nothing more than Word's Normal Paragraph Style with the IntenseEmphasis Character Style applied. They are not part of any automatic numbering sequence. Hence, you can't expect the numbering to re-start as it should after each such heading. You need to apply an appropriate Style with automatic numbering to those headings and make them part of the overall numbering scheme. Even your Chapter Headings aren't part of any auto-numbering scheme.

From what is included in your document, the whole lot could be managed with multi-level auto-numbering applied to just 5 of Word's built-in Heading Styles:
Heading 1: Chapter 7.
Heading 2: Article 7.61
Heading 3: Section 7.61d
Heading 4: 1. Paragraph
Heading 5: a. Sub-Paragraph

Of course, you may have other headings in the document that need to occur above/below these, and those too should be taken into account. Word comes with 9 built-in Heading Styles.