PDA

View Full Version : speeding up an info-gathering loop



marcelma
11-13-2010, 02:49 PM
Hello,

I am stuck with the following problem:

I want to fine-comb through a series of Word files, checking the "legitimacy" of every single character and putting things right where needed.

For this I need to know in detail about the immediate environment of every character. As far as I can presently say, I need at least to have the following information:

- character code of the two characters left of the present character
- character code of the present character
- character code of the two characters right of the present character

- character style of the present character and the character immediately right and left of it
- paragraph style for the present character and the character immediately right and left of it
- ascertain whether the present character or the character immediately right and left of it are either inserted or deleted

- paragraph style for the previous paragraph
- paragraph style for the following paragraph

That makes for a total of 19 variables, which have to be elicited for each character.

My problem is that the core-loop becomes too slow (on my laptop the cursor moves in medium reading speed). Does anybody know a faster and more elegant way of eliciting this information?

I enclose the code for a loop which does nothing else but elicit the needed information. The loop terminates once it hits a paragraph mark.

Thank you very much in advance,
Marcel

Sub S_GetInfo()
Dim V_CCL2 As String, V_CCL1 As String, V_CC As String, V_CCR1 As String, V_CCR2 As String
Dim Vst_ParL1 As String, Vst_Par0 As String, Vst_ParR1 As String
Dim Vst_ChrL1 As String, Vst_Chr0 As String, Vst_ChrR1 As String
Dim Vst_ParagraphL1 As String, Vst_ParagraphR1 As String
Dim V_RevAuthL1 As String, V_RevAuth As String, V_RevAuthR1 As String
Dim V_RevTypeL1 As Integer, V_RevType As Integer, V_RevTypeR1 As Integer

Dim V_Run As Boolean
Dim V_Count As Integer, V_RevisionsCount As Integer
Dim V_RevisionsTypeString As String

Vb_Run = True
While V_CC <> " 13,"
V_Count = Empty
V_RevisionsTypeString = Empty
V_RevTypeL1 = Empty
V_RevType = Empty
V_RevTypeR1 = Empty
V_RevAuthL1 = Empty
V_RevAuth = Empty
V_RevAuthR1 = Empty

Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdExtend
V_CC = " " & AscW(Selection()) & ","
Vst_Par0 = Selection.ParagraphFormat.Style
Vst_Chr0 = Selection.Style
V_RevisionsCount = Selection.Range.Revisions.Count
For V_Loop = 1 To V_RevisionsCount
V_RevisionsTypeString = V_RevisionsTypeString & Selection.Range.Revisions(V_Loop).Type & "|"
If InStr("1|2|", Selection.Range.Revisions(V_Loop).Type) Then
V_RevType = Selection.Range.Revisions(V_Loop).Type
V_RevAuth = Selection.Range.Revisions(V_Loop).Author
V_Count = V_Count + 1
End If
Next V_Loop
If V_Count > 1 Then
Stop 'There is more than 1 insertion and/or deletion at this position
End If
Selection.MoveRight
Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdExtend
V_CCR1 = " " & AscW(Selection()) & ","
Vst_ChrR1 = Selection.Style()
Vst_ParR1 = Selection.ParagraphFormat.Style
V_Count = 0
V_RevisionsTypeString = ""
V_RevisionsCount = Selection.Range.Revisions.Count
For V_Loop = 1 To V_RevisionsCount
V_RevisionsTypeString = V_RevisionsTypeString & Selection.Range.Revisions(V_Loop).Type & "|"
If InStr("1|2|", Selection.Range.Revisions(V_Loop).Type) Then
V_RevTypeR1 = Selection.Range.Revisions(V_Loop).Type
V_RevAuthR1 = Selection.Range.Revisions(V_Loop).Author
V_Count = V_Count + 1
End If
Next V_Loop
If V_Count > 1 Then
Stop 'There is more than 1 insertion and/or deletion at this position
End If
Selection.MoveRight
Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdExtend
V_CCR2 = " " & AscW(Selection()) & ","
Selection.MoveLeft Count:=5
Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdExtend
V_CCL2 = " " & AscW(Selection()) & ","
Selection.MoveRight
Selection.MoveRight unit:=wdCharacter, Count:=1, Extend:=wdExtend
V_CCL1 = " " & AscW(Selection()) & ","
Vst_ChrL1 = Selection.Style()
Vst_ParL1 = Selection.ParagraphFormat.Style
V_Count = 0
V_RevisionsTypeString = ""
V_RevisionsCount = Selection.Range.Revisions.Count
For V_Loop = 1 To V_RevisionsCount
V_RevisionsTypeString = V_RevisionsTypeString & Selection.Range.Revisions(V_Loop).Type & "|"
If InStr("1|2|", Selection.Range.Revisions(V_Loop).Type) Then
V_RevTypeR1 = Selection.Range.Revisions(V_Loop).Type
V_RevAuthR1 = Selection.Range.Revisions(V_Loop).Author
V_Count = V_Count + 1
End If
Next V_Loop
If V_Count > 1 Then
Stop 'There is more than 1 insertion and/or deletion at this position
End If
Selection.MoveRight
ActiveDocument.Bookmarks.Add Name:="here", Range:=Selection.Range
Selection.MoveUp unit:=wdParagraph, Count:=1
Selection.MoveLeft unit:=wdCharacter, Count:=1
Vst_ParagraphL1 = Selection.ParagraphFormat.Style
Selection.MoveDown unit:=wdParagraph, Count:=2
Vst_ParagraphR1 = Selection.ParagraphFormat.Style
Selection.GoTo what:=wdGoToBookmark, Name:="here"

Selection.MoveRight
Wend
End Sub

macropod
11-13-2010, 07:54 PM
Hi marcelma,

You could achieve a major speed-up by:
a) using ranges instead of selections; and
b) turning off screen updating while the code is running.

Note: When posting code, please use the 'VBA' code button to tag your code. That way your code will be much easier to read.

marcelma
11-14-2010, 01:32 AM
Hello macropod,

sorry for the code-format. I have been looking for the option to post code in a different way, but can't seem to find the button you are refering to.

Is there any way to edit my post, so that I can polish up it's appearance?

I have never before consciously worked with ranges - seems I have to get a far better understanding of it than what I am having now.

greetings,
Marcel

macropod
11-14-2010, 03:17 AM
Hi Marcel,

When you're posting, you should see a green/white button with the 'VBA' logo on it; that inserts the VBA code tags, between which you insert your code, so that it (abbreviated for illustration) looks like:
Sub S_GetInfo()
Dim V_CCL2 As String, V_CCL1 As String, V_CC As String, V_CCR1 As String, V_CCR2 As String
'...

While V_CC <> " 13,"
V_Count = Empty
'...
For V_Loop = 1 To V_RevisionsCount
'...
If InStr("1|2|", Selection.Range.Revisions(V_Loop).Type) Then
'...
End If
Next V_Loop
'...
Wend
End Sub

Post editing is only available for a short while after posting (IIRC for 10 minutes).

macropod
11-14-2010, 03:54 AM
Hi Marcel,

On looking more closely at what you're doing, it seems to me you're probably doing far more data collection than you really need to do.

I need to know in detail about the immediate environment of every character. As far as I can presently say, I need at least to have the following information:
...
a total of 19 variables, which have to be elicited for each character
Let's look at this in detail:


- character code of the two characters left of the present character
- character code of the present character
- character code of the two characters right of the present character
Taking this at face value, you therefore really only need to test every character, without regard to the preceding/following characters, since you'll be getting their attributes in any event.


- character style of the present character and the character immediately right and left of it
Again, you may as well test every character, without regard to the preceding/following characters.


- paragraph style for the present character and the character immediately right and left of it
You only need to do this once per paragraph, as a paragraph can have only one paragraph Style.


- ascertain whether the present character or the character immediately right and left of it are either inserted or deleted
Yet again, you may as well test every character, without regard to the preceding/following characters.


- paragraph style for the previous paragraph
- paragraph style for the following paragraph
Again, you only need to do this once per paragraph.

Testing only when you need to will greatly diminish the overheads. If you really need to replicate the current, previous and next paragraph Style information for every character (and I suggest you probably don't really need to), you can simply replicate the data already gathered without the need to re-test.

marcelma
11-14-2010, 06:37 AM
Hello macropod,

thanks a lot for looking so closely at my posting.

I agree that in the loop I posted I do much more data collection than I need. The code I am working with right now does not elicit all this information for every single character and therefore runs much faster, but as I run the code on different texts it turns out that information which I thought would be needed only in specific instances is needed in other instances as well. So I am looking for a more general solution than eliciting the needed information in dozens of individual cases.

Your suggestion that I can process every chacacter individually without regard to the adjacing characters I didn't understand, because I can test each character only in relation to its environment, in itself it would be meaningless.

To give two *simple* examples:
1) A dot is usually legitimate after a small case letter and before a blank or paragraph sign - and there are other cases. If, however, a dot appears between two letters it might mean that the blank is missing, and in such a case the code should ask for my input.
2) A colon is usually legitimate if it follows a small case letter and is followed by a blank and an opening quotation mark. If there is no opening quotation mark after the blank it might mean that it has been forgotten etc. etc. -

So, when I test a character, I need to know its environment.

Within the texts on which I am running the code I usually encounter dozens to hundreds of mistakes, most of which can be handled by VBA without prompting me for input (these are mistakes which do not appear in the spell- or grammar-checker).

Your suggestion to identify the paragraph style only once within a paragraph sounds very reasonable to me and it saves a lot of processing time, but it would mean that I have to find a way of flexibly keeping track of what paragraph I'm in and I haven't yet found a convincing way of doing so.

My main loop looks presently as follows:

While V_Chr <> "|" And Vb_Run
' eliciting information
. . .

If V_ChrCode = "" Then
ElseIf Vb_skipDelChar And V_RevType = wdRevisionDelete Then 'Skip deleted characters
ElseIf V_ChrCode = "1" Then Call S_CheckChr0001(V_RevType, V_CCL1, V_CCL2, V_CCR1, V_CCR2, Vst_Chr0, Vst_Par0, Vb_Run) ' inserted graphic
ElseIf V_ChrCode = "2" Then Call S_CheckChr0002(V_RevType, V_CCL1, V_CCL2, V_CCR1, V_CCR2, Vst_Chr0, Vst_Par0) ' footnote
ElseIf V_ChrCode = "9" Then Call S_CheckChr0009(V_RevType, V_CCL1, V_CCL2, V_CCR1, V_CCR2, Vst_Chr0, Vst_Par0, "9658") ' tabulator
. . .
End If
Selection.MoveRight unit:=wdCharacter, Count:=1

Wend


Thanks again for your input.
Greetings,
Marcel

Paul_Hossler
11-14-2010, 01:15 PM
- ascertain whether the present character or the character immediately right and left of it are either inserted or deleted


Is your document Tracking Changes?

How would you tell is the characters are inserted or deleted?

Paul

Paul_Hossler
11-14-2010, 01:18 PM
It seems like all checks are within a paragraph, so could you loop through the document one paragraph at a time (with look back/look ahead) and process the paragraph?

In the sample below, I just test the before/current/next paragraphs char count and style, but you could expand the logic


Option Explicit

Sub TestConcept()
Dim oPara As Paragraph

For Each oPara In ActiveDocument.Paragraphs
Call TestSub(oPara)
Next
End Sub

Sub TestSub(O As Paragraph)

Dim oPrevPara As Paragraph, oNextPara As Paragraph
Dim oPrevStyle As Style, oNextStyle As Style

Set oPrevPara = Nothing
Set oNextPara = Nothing
Set oPrevStyle = Nothing
Set oNextStyle = Nothing

'save the paragraph's neighbors
If Not O.Previous Is Nothing Then
Set oPrevPara = O.Previous
Set oPrevStyle = O.Previous.Style
End If
If Not O.Next Is Nothing Then
Set oNextPara = O.Next
Set oNextStyle = O.Next.Style
End If


'process the paragraph's text -- just demo
'need to handle first and last paragraph condidtion
If oPrevPara Is Nothing And oNextPara Is Nothing Then
MsgBox "Prev# = " & 0 & " Curr# = " & Len(O.Range.Text) & " Next# = " & 0

ElseIf oPrevPara Is Nothing And Not oNextPara Is Nothing Then
MsgBox "Prev# = " & 0 & " Curr# = " & Len(O.Range.Text) & " Next# = " & Len(oNextPara.Range.Text)

ElseIf Not oPrevPara Is Nothing And oNextPara Is Nothing Then
MsgBox "Prev# = " & Len(oPrevPara.Range.Text) & " Curr# = " & Len(O.Range.Text) & " Next# = " & 0

Else
MsgBox "Prev# = " & Len(oPrevPara.Range.Text) & " Curr# = " & Len(O.Range.Text) & " Next# = " & Len(oNextPara.Range.Text)
MsgBox "Prev Style = " & oPrevPara.Style.NameLocal & " Curr Style = " & O.Style.NameLocal & " Next Style = " & oNextPara.Style.NameLocal
End If


End Sub


Paul

macropod
11-14-2010, 01:37 PM
Hi Marcel,

Your suggestion that I can process every chacacter individually without regard to the adjacing characters I didn't understand, because I can test each character only in relation to its environment, in itself it would be meaningless.

To give two *simple* examples:
1) A dot is usually legitimate after a small case letter and before a blank or paragraph sign - and there are other cases. If, however, a dot appears between two letters it might mean that the blank is missing, and in such a case the code should ask for my input.
2) A colon is usually legitimate if it follows a small case letter and is followed by a blank and an opening quotation mark. If there is no opening quotation mark after the blank it might mean that it has been forgotten etc. etc. -
In that case, you still only need to collect additional information for a defined set of 'special' characters (in general, punctuation marks) - and for these you might do better to collect the previous and next word for context, rather than just the previous and next character.

As an aside, it seems a spelling/grammar check would also do a great deal of why you're trying to achieve via a brute-force approach.

marcelma
11-14-2010, 03:11 PM
Hello macropod,

In principle I agree with you. In most cases it is only necessary to check for a selected set of "special" characters and this is how I have been using the code so far. It has, however, turned out to be not good enough for formatting issues (here I have to check every single character) and I would like to cover them in one and the same run.

This is not a competition to spell-chacking. The texts have all been spell-checked.

The alternative to what you call a "brute-force" approach has been (so far) to actually read the text or to search for typical places of errors. Because the texts are so varried and the contributers have such different levels of skill and competence, this approach has turned out to be not feasable. It is much too time-consuming and needs a level of attention I can not keep up for hundreds of pages - so I better invest a lot attention on a meta-level into geting it done once and for all.

To look at the whole previous and next word has turned out to be not necessary, for nearly all situations the adjacend two characters on the right and on the left side are sufficent. For the few remaining places the code will prompt me to make a decision.

I have re-wirtten the core-loop, but while it has become much more elegant now, it hasn't gained much speed.

Sub S_GetInfo2()
Dim V_CCL2 As String, V_CCL1 As String, V_CC As String, V_CCR1 As String, V_CCR2 As String
Dim Vst_ParL1 As String, Vst_Par0 As String, Vst_ParR1 As String
Dim Vst_ChrL1 As String, Vst_Chr0 As String, Vst_ChrR1 As String
Dim Vst_ParagraphL1 As String, Vst_ParagraphR1 As String
Dim V_RevAuthL1 As String, V_RevAuth As String, V_RevAuthR1 As String
Dim V_RevTypeL1 As Integer, V_RevType As Integer, V_RevTypeR1 As Integer

Dim V_Run As Boolean
Dim V_Count As Integer, V_RevisionsCount As Integer
Dim V_RevisionsTypeString As String

Dim R_ChrEnv As Range, R_ParEnv As Word.Range


Vb_Run = True
While V_Chr <> "13" And Vb_Run
V_Chr = Selection()
V_Count = Empty
V_RevisionsTypeString = Empty
V_RevTypeL1 = Empty
V_RevType = Empty
V_RevTypeR1 = Empty
V_RevAuthL1 = Empty
V_RevAuth = Empty
V_RevAuthR1 = Empty

Set R_ParEnv = Selection.Paragraphs(1).Range
On Error Resume Next
R_ParEnv.MoveStart wdParagraph, -1
R_ParEnv.MoveEnd wdParagraph, 1
Vst_ParagraphL1 = R_ParEnv.Paragraphs(1).Style
Vst_ParagraphR1 = R_ParEnv.Paragraphs(3).Style

Set R_ChrEnv = Selection.Range
R_ChrEnv.MoveStart wdCharacter, -2
R_ChrEnv.MoveEnd wdCharacter, 3
V_CCL2 = " " & AscW(R_ChrEnv.Characters.Item(1)) & ","
V_CCL1 = " " & AscW(R_ChrEnv.Characters.Item(2)) & ","
V_CC = " " & AscW(R_ChrEnv.Characters.Item(3)) & ","
V_CCR1 = " " & AscW(R_ChrEnv.Characters.Item(4)) & ","
V_CCR2 = " " & AscW(R_ChrEnv.Characters.Item(5)) & ","
Vst_ParL1 = R_ChrEnv.Characters.Item(2).ParagraphFormat.Style
Vst_ChrL1 = R_ChrEnv.Characters.Item(2).Style
Vst_Par0 = R_ChrEnv.Characters.Item(3).ParagraphFormat.Style
Vst_Chr0 = R_ChrEnv.Characters.Item(3).Style
Vst_ParR1 = R_ChrEnv.Characters.Item(4).ParagraphFormat.Style
Vst_ChrR1 = R_ChrEnv.Characters.Item(4).Style

V_Count = Empty
V_RevisionsTypeString = Empty
V_RevisionsCount = R_ChrEnv.Characters.Item(2).Revisions.Count
For V_Loop = 1 To V_RevisionsCount
V_RevisionsTypeString = V_RevisionsTypeString & R_ChrEnv.Characters.Item(2).Revisions(V_Loop).Type & "|"
If InStr("1|2|", R_ChrEnv.Characters.Item(2).Revisions(V_Loop).Type) Then
V_RevTypeL1 = R_ChrEnv.Characters.Item(2).Revisions(V_Loop).Type
V_RevAuthL1 = R_ChrEnv.Characters.Item(2).Revisions(V_Loop).Author
V_Count = V_Count + 1
End If
Next V_Loop
If V_Count > 1 Then
Stop 'There is more than 1 insertion and/or deletion at this position
End If
V_Count = Empty
V_RevisionsTypeString = Empty
V_RevisionsCount = R_ChrEnv.Characters.Item(3).Revisions.Count
For V_Loop = 1 To V_RevisionsCount
V_RevisionsTypeString = V_RevisionsTypeString & R_ChrEnv.Characters.Item(3).Revisions(V_Loop).Type & "|"
If InStr("1|2|", R_ChrEnv.Characters.Item(3).Revisions(V_Loop).Type) Then
V_RevTypeL1 = R_ChrEnv.Characters.Item(3).Revisions(V_Loop).Type
V_RevAuthL1 = R_ChrEnv.Characters.Item(3).Revisions(V_Loop).Author
V_Count = V_Count + 1
End If
Next V_Loop
If V_Count > 1 Then
Stop 'There is more than 1 insertion and/or deletion at this position
End If

Selection.MoveRight
Wend
End Sub

Especially the loops I use to find out whether a character has been deleted or inserted still bother me, but I have not come up with a better way.

Thanks for your time and attention.
Marcel

marcelma
11-14-2010, 03:12 PM
Hello Paul,

yes my documents are tracking changes - and this is essential.

greetings,
Marcel

macropod
11-14-2010, 05:26 PM
Hi Marcel,

As I said in my first post:

turning off screen updating while the code is running.
So make the first line of the sub:
Application.ScreenUpdating = Falseand the last line:
Application.ScreenUpdating = True
This is especially beneficial when you're working with selections and/or editing ranges.

If you want to compare timings, add the following lines to the start of the sub:
' Dimension Timer Variables
Dim eTime As Single
' Start Timing
eTime = Timer
and the following lines to the end:' Calculate elapsed time
eTime = (Timer - eTime + 86400) Mod 86400 ' Just in case execution time spans midnight
MsgBox "Execution took " & Int(eTime / 3600) & " hours, " & Int((eTime Mod 3600) / 60) & " minutes and " & Format(eTime Mod 60, "0.00") & " seconds."
Your code is still using & moving selections, and has some issues with its variables. For example, you're missing:
Dim V_Chr As String, V_Loop As Integer, Vb_Run As Boolean
You're also testing:
While V_Chr <> "13" And Vb_Run
but, since Vb_Run is always true and you're trying to test the ASCII value of V_Chr, what I think you really need is:
While V_Chr <> Chr(13)