-
That's why I left the header identification algorithm alone, since I figured that was the thing most likely for you to change. The only thing I can add is that you really should get rid of your IsDigit/IsDigits functionality. It's redudant to IsNumeric and may very well be a bottleneck. If nothing else, it's code to wade through which you don't need to wade through.
At this point, the biggest approach difference I see between your NewSample code is that you were using the paradigm of finding the heading before, rather than the additional collection of headings and then comparing the nearest.
I think the 2nd collection of headings is going to be better, since it means you won't be back-tracking so much. Why go through the same paragraphs multiple times, only to find out that they aren't useful? Especially on a longer document. If you're going to need to loop through each paragraph in order to analyze different heading patterns, you might as well only do it once. Also, give yourself some short-circuits (i.e., if the Len of the paragraph text is <=2 or something, don't do anything). The most likely spot for corruption in a document is section breaks... might as well skip thos.
I think building the collection of headings will still be the bottleneck (since the primary collection of sentences containing search terms uses the Find object) -- but you can further refine if it seems slow. But at the very least you won't have to change any of the methodology of your main routine-- you simply have to try to build the headings collection faster.
Apart from that, you can look at various techniques to optimizing string functions (the statement If Len(myString) = 0 is technically faster than If myString = "" -- although I prefer the later statement for readability).
VBA contains a LIKE operator, although it is limited... and the wild card searching can be very very powerful. But this structure (two collections, one with the sentences, one with the headings) is very scalable. If you then later need to add to it, it's not that difficult (search terms, criteria for the headers).
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules