PDA

View Full Version : Sleeper: Find Acronym in Hebrew text



Dancho
01-30-2012, 11:20 PM
I'm using macro for Acronyms finding in English word text. This macro was written by Lene Fredborg, DocTools - thedoctools.com.
So, now I would like to modify so hebrew acronyms fill be found as well.
First, hebrew acronym is something like ab"c, but english acronym is ABC. secondly, I do not quite understand the following line in the code:



'Use wildcard search to find strings consisting of 3 or more uppercase letters
'Set the search conditions
'NOTE: If you want to find acronyms with e.g. 2 or more letters,
'change 3 to 2 in the line below
.Text = "<[A-Z]{3" & strListSep & "}>"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = True
.MatchWildcards = True
'Perform the search
Do While .Execute
'Continue while found
strAcronym = oRange

what means "<[A-Z]{3" & strListSep & "}>" ?
how can I change it for hebrew acronym: text in any lenght with (") inside it.

Talis
02-01-2012, 10:20 AM
"<[A-Z]{3" & strListSep & "}>"

< means start of word and > means end of word.
[A-Z] means any uppercase letter in the range A-Z.
{3" & strListSep & "} or {3,} or {3;} means at least three of the items in the preceding square brackets. Use of a string variable " & strListSep & " is to allow for whether a comma or semicolon is used as explained by Lene Fredborg:

'Find the list separator from international settings
'May be a comma or semicolon depending on the country
strListSep = Application.International(wdListSeparator)

For anyone interested in the full macro, it can be found here:

http://www.thedoctools.com/downloads/basACRONYMS_Extract.shtml

For hebrew acronyms you could try <[a-z]@"[a-z]> which will find any number of lowercase letters, double quote, a single lowercase letter; however, having looked up hebrew acronyms in the Wikipedia I read the following:
Hebrew typography uses a special punctuation mark called Gershayim (״) to denote acronyms, placing the sign between the second-last and last letters of the non-inflected form of the acronym (e.g. "Report", singular: "דו״ח"; plural: "דו״חות")[1]; initialisms are denoted using the punctuation mark Geresh (׳) by placing the sign after the last letter of the initialism (e.g. "Ms.": "׳בג").[2] However, in practice, single and double quotes are often used instead of the special punctuation marks, with the single quote used both in acronyms and initialisms.
This is way beyond my intellectual capacity and I suspect the search string I've offered will fail.
To test the effectiveness of the search string, simply put it into the Find box in Word loaded with a few hebrew acronyms. Check Use wildcards and see if it locates what you want.
Good luck!

Dancho
02-02-2012, 07:04 AM
Thanks for the answer.
I'm familiar with regular expresions in general, but not with VBA syntax.
I still do not understand what {3,} or {3;} means.
Anyway, your sugestion <[a-z]@"[a-z]> is not syntax correct. How to make it correct with the same semantic: any string with (") inside it.