-
1 Attachment(s)
Attached is the updated version of the macro. It's now running the 200 page document in under 30 seconds, so I'm happy performance-wise. I've also added some additional features and planned several more.
With regard to the code clean-up... I understand that some of my code is considered poor practice. Certainly I could stand to use better variable/function names. But some of my practices are because it's more readable for me--and I'm usually the person who needs to read my code, since I'm not a developer. I'm much better off getting basic logic functions on one line so that I can read comments instead of code (I did correct the comment on that one obscure line so it wasn't inaccurate).
That said, I understand that for people other than me it can be hard to decipher, and I'll try to submit code that's more easily readable from now. I appreciate the tips.
Lastly: The IsDigit/s function(s) are subtly different from IsNumeric: it doesn't count parenthesis. This eliminates hundreds of false positives in the header routine.
Thank you (so much!) for all your work and analysis and advice. I can't tell you how much I appreciate it, and I've learned a great deal in the bargain.
Thanks again,
Dan
-
PS. I love the use of Select Case you suggested. I'm used to the C# version which is somewhat less... flexible. I'll be sure to use it in the future.
-
Ahh, I see the value of your IsDigits/IsDigit functions then. I would suggest commenting that specific functionality, and how it differs... and I think you can simplify(?) to a single function, and still retain the short-circuiting you used in IsDigits. In fact, you can probably gain marginal improvement by further filtering with IsNumeric (no need to 10 characters and find the 11th is a letter if the whole string doesn't first pass the IsNumeric test). I realize you would probably change the structure to start with a single statement a la
If Len(sText) > 0 OR IsNumeric(sText) = True Then
But I leave my flavor because it's easier (for me) to add any additional branches or troubleshoot any flaws in my conceit. For your purposes, there is the possibility of a "bad" return if something was font-formatted poorly but appeared as a digit (which can happen in scanned documents-- the ASC could be one character, but the font formatting could be in Symbol font or wingdings-- and appear as a digit-- theoretically this could happen, anyway).
[vba]
'returns true ONLY if all characters passed by the string are actual digits
'short-circuits logic test if 0-length string or fails initial IsNumeric test
Public Function IsOnlyDigits(sText As String) As Boolean
Dim i As Integer
Dim bRet As Boolean
'if an empty string, return false
If Len(sText) = 0 Then
bRet = False
'if it isn't numeric, then false
ElseIf IsNumeric(sText) = False Then
bRet = False
'if we've gotten here, make sure no IsNumeric(sText) = True statements
'contain any special characters, such as $123.45, or (123.45)
Else
For i = 1 To Len(sText)
Select Case Mid(sText, i, 1)
Case 0 To 9
'continue on
bRet = True
Case Else
bRet = False
GoTo l_exit
End Select
Next
End If
l_exit:
IsOnlyDigits = bRet
End Function
[/vba]
I understand your points on the single line structure, and don't disagree. Every coder has different styles (even if those differences are subtle ones). My comment was made in case you had simply copied and pasted those lines of code without fully understanding them... but if your main purpose is to make the code more readable for you-- then by all means, stick to that. Readability is very important. But I would humbly suggest that as you get used to actual If...End If blocks of code, rather than single line If statements, it will end up being as easy to read as your single line statements, with the addition of being easier to debug when the inevitable happens: something breaks ;)
You're clearly an experienced coder, based on the way you organize and structure your code. I would simply comment the functionality more accurately, so that others will understand what you're doing.
A couple of other notes...
1. I don't think you need to turn off screen updating, at this point. And in any event, you didn't actually present a choice-- just an OK button.
2. You should note that the sentences collection can break in unexpected ways when dealing with the first sentence of a range (which doesn't necessarily mean the first sentence of a paragraph).
The first link sums up a little bit of the problem with some simple scenarios.
http://www.vbaexpress.com/forum/arch...p/t-34386.html
The gmaxey link is something Greg and I worked on, and is probably far more than you need at the moment (there are some simple solutions in the first link)
http://gregmaxey.mvps.org/word_tip_p...sentences.html
In summary, you may not want to use
Set riInfo.RangeInfo = rngSearch.Sentences(1)
but rather
dim rngTemp As Range
Set rngTemp = rngSearch.Sentences(1)
rngTemp.Start = rngSearch.Start
or something along those lines. Basically, the first sentence may end up as not the first sentence if there are special characters involved near the "sentence ender" -- which can easily be the case if you're dealing with scanned documents.
3. I would break your ExtractRequirements routine apart at least a bit. You've got a "read data" part and a "write data" part-- that could be separated. In general, if you're using Dim statements somewhere other than the top of a procedure, that can be a guide for modularizing further. But this is just shifting deck-chairs somewhat. You've commented very well. It's just a long routine with a lot of different things going on.
4. This could be an alternate way to getting your search terms... although I think ultimately you might want to have a userform for this. MsgBox and InputBox are ultimately somewhat limiting... but at least this allows the user to see duplicates rather than guess and have to do your "no duplicates" methodology? You could also utilize the KeyField parameter in a collection (rather than an array), and trap for the error. This is just more collection tricks, but, assuming myCol was a collection, you will raise an error on the second line of the following two lines
myCol.Add "Hello", "Hello"
myCol.Add "Hello", "Hello"
whereas
myCol.Add "Hello"
myCol.Add "Hello"
would not raise an error.
Adding keyfields to collections allows you to reference by the index field or the keyword... so...
myCol(1) is the only way to return "Hello" text for the second collection, but myCol("Hello") would work for a collection you also used the keyfield. I'm not sure where this would help your current code, but collections are a very powerful part of the VBA object model. I'm constantly finding new ways to use them.
[vba]
Public Function fGetSearchTerms() As Variant
Dim sRet As String
Dim sDefault As String
Dim aryTerms() As String
'start with an extra pipe
sDefault = "shall|must|"
Do
sRet = InputBox("The following terms will be searched." & vbCr & _
"To add another, press the right arrow key and type" & vbCr & vbCr & _
"Search terms are *not* case-sensitive" & vbCr & _
"Press ESC or Cancel to stop adding terms", _
"Add Search Keywords?", _
sDefault)
If sRet = "" Then
Exit Do
ElseIf Right(sRet, 2) = "||" Then
Exit Do
ElseIf Right(sRet, 1) = "|" Then
'allow them to continue, since it might have been a mis-click to hit okay
Else
sDefault = sRet & "|"
End If
Loop Until sRet = ""
'remove any trailing pipe characters
Do Until Right(sDefault, 1) <> "|"
sDefault = Left(sDefault, Len(sDefault) - 1)
Loop
'get the array of search terms, using split
aryTerms = Split(sDefault, "|")
'and return it
fGetSearchTerms = aryTerms
End Function
[/vba] 5. You don't need to set default values at the top of a routine when you've defined the types. So...
dim CharCounter As Integer
CharCounter = 0
is redundant. It's dimensioned as an integer... in the absence of any changes, it will be 0.
6. And one last point-- VBA defaults arrays to being 0-based. However, you can use Option Base 1 at the top of your modules to default to 1 based arrays (so you don't have to Dim sKWArray(1 to 2) As String... so with Option Base 1 on, you only need Dim sKWArray(2) As String. However, the Split function (extremely useful with simple arrays-- displayed above), remains 0-based, even with Option Base 1 on. And don't forget to use Option Explicit. You don't currently have it on in your SharedFunctions module.
Good luck, Dan. Nice working with you!
- Jason aka Frosty
-
Thanks again, and thank you for the parting comments as well. My thoughts:
1. What happened to my vbYesNo parameter! I added it back in, then thought better of it and removed the line. You're right, I don't really need it for performance anyway, and it's just one more cryptic prompt to click through for the user.
2. I just downloaded the Deduced Sentences template you worked on... that is wild. I'm having some trouble getting it to work in 64-bit Word, but I'll fiddle. Besides, I think the problems I'm having are with the ribbon part, which I don't need anyway. Thanks for pointing me in that direction; more accurate sentence-reading is definitely the next step.
4. I've been mulling over whether I should just bite the bullet and re-write this as a real application in C#. But now that the performance is in a good place, I probably won't, and I'll probably leave the UI elements alone. Honestly, allowing addition/subtraction of search terms is only implemented because I hate the idea of hard-coding, but if I removed it I don't think the users would miss it. I'm curious though... why do you have that function declared as a variant rather than a string array?
5. You're seeing my ignorance/inexperience with VBA. In C# EVERYTHING needs to be declared and set, and I don't know what all the default values are (i.e. null, empty, "", 0, etc.) so sometimes I set the value just to be safe. I'll do some research and remove the useless declarations.
6. This is another C# takeaway, where everything is 0 based. I declared LB and UB partly so I remember what the lower bound is :p. I've heard of the Option Base 0/1 but I'd be worried that I'd forget about it. Thanks for noting that about Split, though... that's a little crazy. And I've added Option Explicit now to all my modules. I agree, it's a must-have, especially for someone used to strict typing.
Thanks again for all your help. It was a real privilege working on this with you.
Dan
-
1. Probably went away when you put your message into a constant ;)
2. Yes, the UI isn't something you'd want... but the class itself may be useful. There are samples of how to use the class (forgetting the ribbon stuff) within other modules. But it wasn't necessarily constructed with the kind of use you might put to it, so I'm not sure if it would be a help or a hindrance at this point.
3. What happened to three? ;)
4. In VBA, you can't have arrays as returns of functions. You have to use a Variant. Doesn't make sense to me either. You can pass arrays as parameters (and since you can pass them, you can return them into a parameter). Or you can have a public/module variable that a function populates. But you can't actually have a function return the array. I think I knew why at one point, but I no longer remember.
[vba]
Sub HelloWorld()
Dim i As Integer
Dim sMsg As String
Dim sRet As String
Dim aryMy() As String
aryMy = GetMyArray
For i = 0 To UBound(aryMy)
sRet = aryMy(i)
If sRet = "" Then
Exit For
Else
sMsg = sMsg & " " & sRet
End If
Next
MsgBox sMsg
End Sub
'returned from the function
Public Function GetMyArray() As Variant
Dim aryMy(10) As String
'populate the parameter
GetMyArray2 aryMy
'return it
GetMyArray = aryMy
End Function
'returned from the parameter
Public Function GetMyArray2(aryRet() As String) As Boolean
aryRet(0) = "Hello"
aryRet(1) = "World"
GetMyArray2 = True
End Function
[/vba] 5. It won't be much research... all data types start with a value. And they are basically what you'd expect:
boolean is false
int/long/single/dbl is 0
string is ""
Date is 0, but really #12:00:00 AM#
Variant is Empty
Object is Nothing
Probably the stupidest thing they did in VBA was reverse the convention of what the boolean value is. I believe in other programming languages that 0 is true, and all other values are false, right? In VBA it is the opposite.
-
3. I didn't want to fall into the trip of answering something when I didn't have anything in particular to say just to make a complete numbered list. That way lies long boring conversations. :)
4. Huh. That's annoying.
5. Heh, that's pretty much all the research I needed. Thanks! I swear, dealing with uninstantiated variables and all the myriad forms of null that exist is the most annoying part of programming, particularly when moving between languages. I'm doing a project in Access (my first) right now and I had the most aggravating experience testing text boxes for being either null or blank. I mean, seriously? I understand why, but it's things like that that make me hate VBA. It goes halfway toward trading performance for ease/rapid development, but you're never sure which half! If I could script office in C#, I would.
Your last comment reminded me: using IIF is actually a C# habit as well. I use ternary operators all the time--they make so much more sense to me than 5 lines of If/Then code--and I was looking for the VBA equivalent.
More to the point: in C# (I'm a mere hobbyist in C#, but I certainly feel more comfortable with it than VB, which is new to me within the last few months), Booleans cannot be converted to integers. If I wanted a conversion, I would have to write something like:
bool b = false;
int i = b ? 1 : 0;
\\i now equals 0
But there is no implicit conversion. There is an explicit conversion using the Convert.ToInt32() function, and in that case false=0.
I have a feeling you have a good reason for your distaste of VBA's choice in that regard, but personally I prefer it. For example, I can multiple two arrays (one boolean and one integer) and sum the resulting array to get the sum of, for example, all the items chosen from check boxes. Or all the items that meet some criteria.
The only reason I'd want 0 to be true is if variables weren't type-defined, in which case I would want to be able to explicitly test for true rather than false.
But that's just me.
Dan
-
Hmm, this is one of those times when I get to revisit something I always accepted as truth, but doesn't appear to be true. One of my mentors years ago told me the boolean type in VBA was reversed from previous programming languages. So I had always thought that was true. But it may not be... so disregard that comment.
My main programming experience has been in VBA, so I never had that assumption challenged until now. So thanks!
As for implicit/explicit conversions... VBA does a lot implicitly, although there are a lot of explicit conversion functions in VBA... CBool, CInt, etc...
CBool(0) = False
CBool(-1) = True
CBool(1231245) = True
but...
CInt(True) = -1
Cint(False) = 0
But the conversion functions aren't blackboxed, so
CBool(myObject)
Will return a type mismatch error (which is why I have a bunch of comparable functions which return the default value on any error, i.e.,)
[vba]
Function fCInt(val) As Integer
On Error Goto l_err
fCInt = CInt(val)
l_exit:
Exit Function
l_err:
'blackbox to default value of the type
resume l_exit
End Function
[/vba] This allows
fCInt(ActiveDocument) to return 0... where CInt(ActiveDocument) will return a Type Mismatch error. Of course, that application doesn't make a lot of sense... but can be very useful if you write the fCStr analog, and pass it a Null value in your Access application. Then you get your empty string, without having to code for the Null state on each test.
VBA will coerce a lot of values of different data types, and do implicit conversions for you a lot of times, but fail to do so when just when you start relying on it. Of course, with experience, you start to avoid the pitfalls instinctively, but I think it's pretty rough when you first start learning.
myIntVar = True
is an acceptable line of code and won't cause an error... which doesn't seem like a big deal, but when applied to something like a Range object (which you can set to a string variable, and get the text of the range implicitly converted to a string -- but only if the range isn't nothing), it can cause mysteries to the beginning vba programmer.
I believe in VBA that booleans are technically integers... the help on Data Type Summary indicates it uses the same amount of memory (2 Bytes).
Couple additional things that may help, if you don't yet know about them: Locals Window/Watch Window/Immediate Pane. I find those three items extremely useful in programming. And it will also allow you to check out existing values of items while you program. You may find these a little more useful than the standard way of learning VBA: recording macros and seeing what gets spit out.
In any event, glad to help push the ball forward for you a bit in learning VBA.
-
1 Attachment(s)
Oh, and this is my version of Greg's coding of the sentences class... slightly simpler interface, but (I believe) the core functionality is the same (he and I worked on the class together, after that the majority of it was his baby-- but if you're having issues with the interface, this doesn't have any of that).
Maybe this will help...
-
Thanks Frosty, both for the notes and for the class with the simplified UI. It works without causing the problems I was experiencing with the ribbon UI.
My favorite sentence: "Jason is great, but he's not in a list."
Dan
-
Haha. When I used to teach Word classes, the first thing I had students type was "Jason is great" -- it was always good for a laugh and a bit of an icebreaker... then we would start underlining/bolding "Jason" and "great" as an entry into the concepts of font formatting.
So that has stuck with me when I need short sample text in the same way that "Hello World" is the standard first text anyone every programs to display to an end-user. ;)
I haven't looked at the sentence class in 6 months, and have yet to apply it in a real-world application, so there may be some warts to discover. Please chime in if you discover some.
Cheers!
Jason aka Frosty