Results 1 to 20 of 30

Thread: Complex macro has very poor performance

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #10
    VBAX Master
    Joined
    Feb 2011
    Posts
    1,480
    Location
    Well, I couldn't resist. This seems to work much better on your demo document. The general concept is...
    1. Build a collection of found sentences, based on your search terms (currently "shall" and "will")
    2. Build a collection of found Headings, based on your existing string functionality analyzing the text of each paragraph in the document which has a text length of longer than 1 character (i.e., don't bother with empty paragraph marks, section breaks, etc)
    3. cycle through the collection of found sentences, adding the appropriate heading text, by comparing the start ranges
    4. Spit the results into a Word table (I'll leave getting it into Excel to you).

    I left out the idea of wildcard searching, because a) I always have to wrap my mind around the wildcard searching process-- there are others here who are much faster and familiar with that and b) it seemed pretty fast already.

    Running this on your demo document completes the operation in about 10 seconds. It will be a bit slower when you have to dump the data into Excel (since it takes time to open the excel app, generate a new worksheet, etc), but I think you can probably work out how to deal with the collection based on the creation of the table.

    So, in summary... this requires the following:
    1. A class called myInfo, which contains the following
    [vba]
    Option Explicit
    'for the actual found range
    Public InfoRange As Range
    'the heading string of the found range
    Public sHeading As String
    'this can be used to re-order in the logical order
    Public lKeyWordStart As Long
    Public sKeyWordFound As String

    'return the text, but remove any empty paragraph marks (including multiple paragraph marks)
    Public Property Get InfoText() As String
    If Not InfoRange Is Nothing Then
    InfoText = Replace(InfoRange.text, vbCr, "")
    End If
    End Property
    [/vba] 2. A second class called myHeading, which contains the following
    [vba]
    Option Explicit

    Public sHeadingText As String
    Public rngHeadingPara As Range
    Public lStart As Long
    [/vba] 3. The following code in a regular module (I've included your original code, as well as a couple of functions I had strictly for testing from the immediate window, although I think you only need the HeaderOutlineTitle function.
    [vba]
    Option Explicit
    Dim sKWArray() As String
    Sub Demo()
    Dim oInfo As myInfo
    Dim rngSearch As Range
    Dim sKeyWords As String
    Dim sKeyWord As String
    Dim i As Integer
    Dim colFoundInfo As Collection
    Dim colHeaders As Collection

    'get our keywords
    sKeyWords = "shall|will"
    'set up our collection of info
    Set colFoundInfo = New Collection

    'search for each of our keywords
    For i = 0 To UBound(Split(sKeyWords, "|"))
    'reset the range for each search term
    Set rngSearch = ActiveDocument.Content
    sKeyWord = Split(sKeyWords, "|")(i)
    'set up the find, not matching case
    With rngSearch.Find
    .MatchCase = False
    'get the keyword to search
    .text = sKeyWord
    'while we find our search term
    Do Until .Execute = False
    'create a new object
    Set oInfo = New myInfo
    'put the relevant data into it
    oInfo.lKeyWordStart = rngSearch.Start
    Set oInfo.InfoRange = rngSearch.Sentences(1)
    oInfo.sKeyWordFound = sKeyWord

    'oInfo.sHeading = fReturnHeadingBefore(oInfo.InfoRange)
    'here we would add in functionality to grab the header info for this object
    '(or could put into the actual class-- depends on your preference)
    'and then add the objectto the collection
    colFoundInfo.Add oInfo
    Application.StatusBar = "Added heading: " & oInfo.sHeading
    Loop
    End With
    Next
    'now, get the heading texts collection
    Set colHeaders = fGetHeadingCollection
    For Each oInfo In colFoundInfo
    oInfo.sHeading = fGetClosestHeadingText(oInfo.lKeyWordStart, colHeaders)
    Next

    'output to a word table, just for the purposes of this demo
    DemoOutput colFoundInfo
    End Sub
    Public Function fGetClosestHeadingText(lBefore As Long, colHeadings As Collection) As String
    Dim oHeading As myHeading
    'Dim colHeadings As Collection
    Dim i As Integer
    Dim sRet As String

    'Set colHeadings = fGetHeadingCollection
    'if we're before any headings, then return an empty string before we loop through
    If lBefore < colHeadings(1).lStart Then
    Exit Function
    End If
    'cycle through all the headings, finding the one with the closest item
    'start at 2, since we always refer to the previous one
    For i = 2 To colHeadings.Count
    Set oHeading = colHeadings(i)
    If lBefore < oHeading.rngHeadingPara.Start Then
    sRet = colHeadings(i - 1).sHeadingText
    Exit For
    End If
    Next
    'it's the last heading
    If sRet = "" Then
    sRet = colHeadings(colHeadings.Count).sHeadingText
    End If
    fGetClosestHeadingText = sRet
    End Function
    'build the collection of headings
    Public Function fGetHeadingCollection() As Collection
    Dim oPara As Paragraph
    Dim oHeading As myHeading
    Dim sRet As String
    Dim colRet As Collection

    Set colRet = New Collection
    For Each oPara In ActiveDocument.Paragraphs
    'don't bother with blank paragraphs or heading rows
    If Len(oPara.Range.text) > 1 Then
    sRet = HeaderOutlineTitle(oPara.Range.text)
    If Len(sRet) > 0 Then
    Set oHeading = New myHeading
    oHeading.sHeadingText = sRet
    oHeading.lStart = oPara.Range.Start
    Set oHeading.rngHeadingPara = oPara.Range.Duplicate
    colRet.Add oHeading
    End If
    End If
    Next
    Set fGetHeadingCollection = colRet
    End Function
    'just dump it into a table at the moment
    Sub DemoOutput(colInfo As Collection)
    Dim oTable As Table
    Dim oDoc As Document
    Dim i As Integer
    Dim oInfo As myInfo

    Set oDoc = Documents.Add
    Set oTable = oDoc.Tables.Add(oDoc.Content, colInfo.Count + 1, 3)
    With oTable
    .Cell(1, 1).Range.text = "Position"
    .Cell(1, 2).Range.text = "Header"
    .Cell(1, 3).Range.text = "Sentence"
    For i = 1 To colInfo.Count
    Set oInfo = colInfo(i)
    .Cell(i + 1, 1).Range.text = oInfo.lKeyWordStart
    .Cell(i + 1, 2).Range.text = oInfo.sHeading
    .Cell(i + 1, 3).Range.text = oInfo.InfoText
    Next

    'quick and dirty sort, delete and autofit
    .Sort excludeHeader:=True, FieldNumber:="Column 1", SortFieldType:=wdSortFieldNumeric, _
    SortOrder:=wdSortOrderAscending
    .Columns(1).Delete
    .AutoFitBehavior (wdAutoFitContent)
    End With

    End Sub
    Sub SelectTheNextShallSentence()
    Dim rngSearch As Range
    Dim oInfo As myInfo

    Set rngSearch = ActiveDocument.Content
    rngSearch.Start = Selection.Start
    With rngSearch.Find
    .MatchCase = False
    .text = "shall"
    If .Execute Then
    Set oInfo = New myInfo
    oInfo.lKeyWordStart = rngSearch.Start
    Set oInfo.InfoRange = rngSearch.Sentences(1)
    oInfo.sKeyWordFound = "shall"
    oInfo.InfoRange.Select
    End If
    End With
    End Sub
    Sub DisplayTheHeadingOfThisSentence()
    MsgBox fReturnHeadingBefore(Selection.Range)
    End Sub
    Function fReturnHeadingBefore(rngWhere As Range) As String
    Dim oPara As Paragraph
    Dim rngSearch As Range
    Dim i As Integer
    Dim sRet As String

    Set rngSearch = rngWhere.Duplicate
    rngSearch.Start = rngSearch.Parent.Content.Start
    For i = rngSearch.Paragraphs.Count To 1 Step -1
    Set oPara = rngSearch.Paragraphs(i)
    sRet = HeaderOutlineTitle(oPara.Range.text)
    If sRet <> "" Then
    fReturnHeadingBefore = sRet
    Exit For
    End If
    Next
    End Function
    'See if we have a heading-style number. Returns the heading, or blank if not a heading.
    Public Function HeaderOutlineTitle(line As String) As String
    'Set up variables
    Dim CharCounter As Integer
    Dim NumberCounter As Integer
    Dim tCh As String
    Dim y As Integer

    CharCounter = 0
    NumberCounter = 0

    'For the 4 character headings, this skips all the unncessary logic
    If Len(line) >= 4 And IsDigits(Left(line, 4)) Then
    HeaderOutlineTitle = Left(line, 4)
    GoTo SkipToEnd
    End If

    'Start looping through the line character by character
    For y = 1 To Len(line)
    tCh = Mid(line, y, 1)

    'This allows the letter "H" to be the first character
    If y = 1 And tCh = "H" Then
    GoTo SkipNumberCheck
    End If

    'If it's blank or a number or a period, exit
    If tCh = " " Or (Not IsDigit(tCh) And tCh <> ".") Then
    Exit For
    End If

    'If this is a number, keep track, otherwise reset the counter
    NumberCounter = IIf(IsNumeric(tCh), NumberCounter + 1, 0)

    'If we've had more than 2 numbers in a row, we're probably not seeing a heading
    If NumberCounter > 2 Then
    CharCounter = 0
    Exit For
    End If

    SkipNumberCheck: 'We can also skip all the other logic since it's the first character
    CharCounter = CharCounter + 1
    Next y

    'If this was a outline heading, list that information in column 4
    '(AND As long as the last character before the space was a number)
    If CharCounter <> 0 Then
    If IsNumeric(Mid(line, CharCounter, 1)) Then
    HeaderOutlineTitle = Left(line, CharCounter)
    Else
    HeaderOutlineTitle = Left(line, CharCounter - 1)
    End If
    Else
    HeaderOutlineTitle = ""
    End If

    SkipToEnd:
    End Function
    Function IsDigit(char As String) As Boolean
    'Uses [Asc] function to get an ASCII code of a character.
    Dim iAsciiCode As Integer
    iAsciiCode = Asc(char)

    'The ASCII codes for the digits range from 48 to 57. If an ASCII
    'code of the character is within this range, function returns true
    IsDigit = iAsciiCode >= 48 And iAsciiCode <= 57
    End Function

    Function IsDigits(text As String) As Boolean
    'If the text is 0 length, it's not a digit-only string
    If Len(text) = 0 Then
    IsDigits = False
    Else
    Dim x As Integer
    'the default is true. We'll set to false if...
    IsDigits = True
    For x = 1 To Len(text)
    '...we find something that's not a digit
    If Not IsDigit(Mid(text, x, 1)) Then
    IsDigits = False
    Exit For
    End If
    Next x
    End If
    End Function
    Function FileThere(FileName As String) As Boolean
    FileThere = (Dir(FileName) > "")
    End Function

    'Queries the user for a file name where we'll save our information
    Function GetXLSFileName() As String
    'open excel document
    Dim fd As FileDialog
    Set fd = Application.FileDialog(msoFileDialogFilePicker)
    Dim vSelFile As Variant
    With fd
    .AllowMultiSelect = False
    .Filters.Add "Excel 2003", "*.xls", 1
    .Filters.Add "Excel 2007 & 2010", "*.xlsx", 1
    .Title = "Choose your destination spreadsheet file."

    'Show the dialog box
    If .Show = -1 Then
    'Set the file string to the selected file
    GetXLSFileName = .SelectedItems(1)
    'If the user presses Cancel...
    Else
    MsgBox "No file selected. Macro will use 'C:\ShallStatements.xls' as default."
    GetXLSFileName = "C:\ShallStatements.xls"
    End If
    End With
    End Function

    'Interface function that gathers keywords to search for in the document.
    Function SetUpKeywords() As Boolean
    ReDim sKWArray(1 To 2) As String
    sKWArray(1) = "shall"
    sKWArray(2) = "must" '<< Section simplified in this example as it was not causing a performance issue
    SetUpKeywords = True
    End Function

    'Finds the first searched keyword in a line and returns the position
    Function HasKeywords(line As String) As Integer
    HasKeywords = 0
    Dim x As Integer
    For x = 1 To UBound(sKWArray)
    HasKeywords = IIf(Not IsNull(InStr(1, line, sKWArray(x), 1)), InStr(1, line, sKWArray(x), 1), 0)

    'Exit the loop if we find something
    If HasKeywords <> 0 Then Exit For
    Next x
    End Function




    'Main function that extracts requirements from the document
    Sub ExtractRequirements()
    Dim x As Integer, y As Integer, tCh As Integer
    Dim intRowCount As Integer 'The row excel will import data into. Used as counter & start value.
    intRowCount = 10 'Data starts on row 10. The first 9 rows are reserved for performance metrics and the headers.

    'Create arrays to hold data
    Dim sHeaders() As String
    Dim sStatements() As String
    ReDim sHeaders(1 To 1000) As String 'Supports maximum of 1000 requirements per document.
    ReDim sStatements(1 To 1000) As String 'Supports maximum of 1000 requirements per document.

    'Record Start Time
    sHeaders(1) = "Start Time"
    sStatements(1) = DateTime.Now

    'Write headers
    sHeaders(intRowCount - 1) = "Header"
    sStatements(intRowCount - 1) = "Requirement"

    'Figure out how many lines are in the document
    Dim NumOfLines As Integer
    NumOfLines = ActiveDocument.ComputeStatistics(wdStatisticLines, False)

    'Record time for setup to be completed
    sHeaders(2) = "Lines Computed"
    sStatements(2) = DateTime.Now

    'Set up requirement words (exit sub on failure)
    If Not SetUpKeywords Then Exit Sub

    'Turn off screen updating to improve performance
    If MsgBox("Would you like to turn off screen updating to improve performance? " & _
    "Please note that this may cause Word to appear unresponsive for quite " & _
    "some time on large files.", vbYesNo, "Improve Performance") = vbYes Then
    Application.ScreenUpdating = False
    End If

    'Record time for information to be collected
    sHeaders(3) = "Params Collected"
    sStatements(3) = DateTime.Now

    'Loop through every line and look for headers
    Dim tmpHeaderRslt As String, LineText As String, SentenceText As String, header As String, ShallPos As Integer
    header = "None" 'This is the header value used before we've found a legitimate header
    For x = 1 To NumOfLines
    'Go to the correct line
    Selection.GoTo wdGoToLine, wdGoToAbsolute, x

    'Expand to select the entire line, but skip the logic if it's blank
    Selection.Expand wdLine
    LineText = Trim(Selection.text)
    If LineText <> "" Then
    'see if it's a header, and if so, set the header
    tmpHeaderRslt = HeaderOutlineTitle(LineText)
    If tmpHeaderRslt <> "" Then header = tmpHeaderRslt

    'See if the line has the word "shall" or "must" in it
    ShallPos = HasKeywords(LineText)

    'If so, write it to the spreadsheet
    If ShallPos > 0 Then
    'Move to the correct character and expand to the full sentence
    Selection.Collapse wdCollapseStart
    Selection.Move wdCharacter, ShallPos
    Selection.Expand wdSentence
    SentenceText = Selection.text

    'Write the header to the spreadsheet
    sHeaders(intRowCount) = header
    sStatements(intRowCount) = SentenceText

    'Move the selection forward to the proper line (avoids replication)
    Selection.Collapse wdCollapseEnd
    If (x + 1) < Selection.Information(wdFirstCharacterLineNumber) Then x = Selection.Information(wdFirstCharacterLineNumber) - 1

    'Incremement the row counter
    intRowCount = intRowCount + 1
    End If
    End If
    Next x

    'Record time for statements to be collected
    sHeaders(4) = "Statements Collected"
    sStatements(4) = DateTime.Now

    'Open excel file and set up for receiving data
    Dim xlsFileString As String
    xlsFileString = GetXLSFileName
    Dim appExcel As Object
    Dim objSheet As Object
    Set appExcel = CreateObject("Excel.Application")
    Set objSheet = appExcel.workbooks.Open(xlsFileString).Sheets("Sheet1")

    'Write data to the spreadsheet
    sHeaders(5) = "Excel write begin"
    sStatements(5) = DateTime.Now
    For x = 1 To intRowCount
    objSheet.Cells(x, 1) = sHeaders(x)
    objSheet.Cells(x, 2) = sStatements(x)
    Next x

    'Write end time
    objSheet.Cells(6, 1) = "Excel write ends"
    objSheet.Cells(6, 2) = DateTime.Now

    'Clean up and open for viewing.
    appExcel.workbooks(1).Close True
    appExcel.Quit
    Set objSheet = Nothing
    Set appExcel = Nothing

    'Open the new spreadsheet for viewing
    If FileThere("C:\Program Files\Microsoft Office\Office12\excel.exe ") Then
    Shell ("C:\Program Files\Microsoft Office\Office12\excel.exe """ & xlsFileString & """")
    ElseIf FileThere("C:\Program Files\Microsoft Office\Office14\excel.exe ") Then
    Shell ("C:\Program Files\Microsoft Office\Office14\excel.exe """ & xlsFileString & """")
    End If
    End Sub
    [/vba] As a note: Your use of the IsDigit or IsDigits functions could be substituted by simply using the built in IsNumeric function (which accomplishes the same thing those two functions do) within the HeaderOutlineTitle function. But I left that alone, as that is your primary heading pattern analysis which you are the most comfortable with. And it also doesn't appear to adversely impact the speed that much.

    Since this works in about 10 seconds on your 30 page document, I would imagine you shouldn't have this run much longer than a minute on any longer document.

    As a general note-- *always* use Option Explicit at the top of your modules. It saves you from having VBA try to determine the type of a variable on the fly. You didn't have a Dim y as Integer in your HeaderOutlineTitle function, which means it will be a Variant type-- that can be costly to the memory process.

    And lastly-- please note that since I went away from your "line by line" approach, and instead use the Sentences collection, there will be two major problems that you will need to handle.
    1) Sentences collection is really tricky. It truncates at times you wouldn't expect, and doesn't truncate at other times. Dr. Rev. Martin Luther King Jr. died in the 20th century. That "sentence" looks to Word like it is 4 sentences.
    2) The sentences collection also continues on using any white space. So in some cases, the text returned will include a lot of empty paragraphs. So I've simply placed in the class a quick and dirty method of removing any paragraph marks from the myInfo class. There are numerous ways to approach this, but this is one way.

    Hope this helps. Please post the code you got to, if you're interested in any suggestions for your approach. The main takeaway from this approach is that the use of collections is far superior to your inclination to use arrays. Not only can you put custom objects (the classes) into collections, being able to use the .Range.Start property as your main algorithm for document location speeds things up considerably, because you don't have to a) go through the document line by line and b) Longs are much less memory intensive than a string. When you magnify that by string arrays with 1000 elements, you can definitely slow yourself down.

    Cheers,
    - Jason aka Frosty
    Last edited by Frosty; 06-07-2012 at 03:14 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •