PDA

View Full Version : [SOLVED] Abbreviate the next succeeding instances of scientific names



swaggerbox
12-22-2019, 06:58 PM
I have a string (in a textbox) that contains several instances of a scientific name (inside SN tags). What I would like is that for each succeeding instance(s) of a scientific name, the name is abbreviated. For example, for the first instance of a scientific name (e.g. Escherichia Coli), it is fully expanded, e.g. <SN>Escherichia Coli</SN> and for the next succeeding instance, it is abbreviated. Genus or the first word of the scientific name is only the first letter followed by the period and then the specie name, e.g. <SN>E. Coli</SN>. It is not only Escherichia Coli but for all scientific names enclosed in <SN> tags. Any help would be appreciated. I have attached a representative sample of the WB.

macropod
12-22-2019, 10:35 PM
Does this have to be done in a textbox in Excel?

swaggerbox
12-23-2019, 12:07 AM
the answer is a yes, Paul but I am open to other options as well but would prefer a textbox.

macropod
12-23-2019, 12:17 AM
Easily done in Word if you're happy to do it there. For example:

Sub Demo()
Application.ScreenUpdating = False
Dim Rng As Range, StrTxt As String
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "(\<SN\>[A-Z])[!.]@( *\</SN\>)"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchWildcards = True
.Execute
End With
Do While .Find.Found
Set Rng = .Duplicate
StrTxt = Split(Split(.Text, ">")(1), "<")(0)
.Text = "<SN>" & StrTxt & " (" & Left(StrTxt, 1) & "." & _
Right(StrTxt, Len(StrTxt) - InStr(StrTxt, " ") + 1) & ")</SN>"
StrTxt = "(\<SN\>" & Left(StrTxt, 1) & ")" & _
Mid(StrTxt, 2, InStr(StrTxt, " ") - 2) & "(" & _
Right(StrTxt, Len(StrTxt) - InStr(StrTxt, " ") + 1) & "\</SN\>)"
With Rng
.End = ActiveDocument.Range.End
With .Find
.Text = StrTxt
.Replacement.Text = "\1.\2"
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End With
.Collapse wdCollapseEnd
.Find.Execute
Loop
End With
Application.ScreenUpdating = True
End Sub
As it seems you're wedded to Excel, you could automate Word from Excel, copy the textbox content (not the textbox) into a new Word document, then run the above (suitably modified for automation) before copying the result back to the Excel textbox, closing the document w/o saving and quitting Word.

swaggerbox
12-23-2019, 12:24 AM
wow i will probably just do that and call the word macro from excel. thanks a lot Paul.

macropod
12-23-2019, 12:39 AM
As coded, the macro turns

<SN>Escherichia coli</SN>, is a type of bacteria that normally lives in your intestines. It’s also found in the gut of some animals. Most types of <SN>Escherichia coli</SN> are harmless and even help keep your digestive tract healthy. But some strains can cause diarrhea if you eat contaminated food or drink fouled water. While many of us associate <SN>Escherichia coli</SN> with food poisoning, you can also get pneumonia and urinary tract infections from different types of the bacteria. In fact, 75% to 95% of urinary tract infections are caused by <SN>Escherichia coli</SN>. Some versions of <SN>Escherichia coli</SN> make you sick by making a toxin called Shiga. This toxin damages the lining of your intestine. The strains of <SN>Escherichia coli</SN> that make the toxin are sometimes called STEC, which is short for “Shiga toxin-producing <SN>Escherichia coli</SN>.” One especially bad strain, O157:H7, can make you very sick. It causes abdominal cramps, vomiting, and bloody diarrhea. It is the leading cause of acute kidney failure in children.
into

<SN>Escherichia coli (E. coli)</SN>, is a type of bacteria that normally lives in your intestines. It’s also found in the gut of some animals. Most types of <SN>E. coli</SN> are harmless and even help keep your digestive tract healthy. But some strains can cause diarrhea if you eat contaminated food or drink fouled water. While many of us associate <SN>E. coli</SN> with food poisoning, you can also get pneumonia and urinary tract infections from different types of the bacteria. In fact, 75% to 95% of urinary tract infections are caused by <SN>E. coli</SN>. Some versions of <SN>E. coli</SN> make you sick by making a toxin called Shiga. This toxin damages the lining of your intestine. The strains of <SN>E. coli</SN> that make the toxin are sometimes called STEC, which is short for “Shiga toxin-producing <SN>E. coli</SN>.” One especially bad strain, O157:H7, can make you very sick. It causes abdominal cramps, vomiting, and bloody diarrhea. It is the leading cause of acute kidney failure in children.
Note the parenthetic insertion into the first term.

swaggerbox
12-23-2019, 02:04 AM
hi paul i dont really need the parenthetic insertion during the first instance. can you remove it?

macropod
12-23-2019, 02:56 AM
Simply comment-out or delete:



.Text = "<SN>" & StrTxt & " (" & Left(StrTxt, 1) & "." & _
Right(StrTxt, Len(StrTxt) - InStr(StrTxt, " ") + 1) & ")</SN>"

Artik
12-23-2019, 03:21 AM
Without going outside, in warm slippers and sitting in front of the computer. :winking2:
The code should be placed in the Sheet1 module. After entering the text (or pasting) into TextBox1, press ESC or click the mouse outside of TextBox1.

Private Sub TextBox1_LostFocus()
Dim lPos1 As Long
Dim lPos2 As Long
Dim lPos3 As Long
Dim strToChange As String
Dim strNew As String
Dim strLeft As String
Dim strTxt As String

strTxt = Me.TextBox1.Text

lPos1 = InStr(1, strTxt, "<SN>", vbTextCompare)

If lPos1 > 0 Then
lPos2 = InStr(1, strTxt, "</SN>", vbTextCompare)
strToChange = Mid(strTxt, lPos1 + Len("<SN>"), lPos2 - lPos1 - Len("<SN>"))

lPos3 = InStr(1, strToChange, Space(1), vbTextCompare)
strNew = Left(strToChange, 1) & "." & Mid(strToChange, lPos3)

strToChange = "<SN>" & strToChange & "</SN>"
strNew = "<SN>" & strNew & "</SN>"
strLeft = Left(strTxt, lPos2 + Len("</SN>"))

With Me.TextBox2
.Value = strLeft & Replace(strTxt, strToChange, strNew, Len(strLeft) + 1, , vbTextCompare)
.Activate
.SelStart = 0
.SelLength = Len(.Text)
End With

End If

End Sub
Artik

snb
12-23-2019, 05:15 AM
Use:


Sub M_snb()
sn = Split(Replace(Sheet1.TextBox1.Text, "/", ""), "<SN>")
c00 = "<SN>" & Left(sn(1), 1) & ". " & Split(sn(1))(1) & "</SN>"

For j = 3 To UBound(sn) Step 2
sn(j) = c00
Next

MsgBox Join(sn, "")
End Sub

swaggerbox
12-23-2019, 08:03 AM
thanks Paul.

Artik/snb,
These are great alernatives. With these I dont have to go outside excel. thnks alot

snb
12-23-2019, 09:27 AM
This code is sufficient:


Sub M_snb()
c00 = Sheet1.TextBox1.Text
c01 = Split(Split(c00, ">")(1), "<")(0)

MsgBox Left(c00, InStr(c00, "</") - 1) & Replace(c00, c01, Left(c01, 1) & ". " & Split(c01)(1), InStr(c00, "</"))
End Sub

p45cal
12-23-2019, 12:03 PM
It is not only Escherichia Coli but for all scientific names enclosed in <SN> tags.I thought I'd address this because most suggestions haven't.




I have attached a representative sample of the WB.It's not especially representative because there's only one scientific name!



In the attached I've had a go with 2 macros:
Sub blah()
Set Dict = CreateObject("scripting.dictionary")
TheString = Sheet1.TextBox1.Text
Debug.Print TheString
SplitString = Split(TheString, "<SN>")
For Each PartString In SplitString
Posn = InStr(PartString, "</SN>")
If Posn > 0 Then
TagContents = "<SN>" & Left(PartString, Posn - 1) & "</SN>"
If Not Dict.exists(TagContents) Then
AbbreviatedTag = Split(Application.Trim(TagContents))
AbbreviatedTag(0) = Left(AbbreviatedTag(0), 5) & "."
AbbreviatedTag = Join(AbbreviatedTag)
Dict.Add TagContents, AbbreviatedTag
myCount = Len(TheString) - Len(Application.Substitute(TheString, TagContents, AbbreviatedTag)) / (Len(TagContents) - Len(AbbreviatedTag))
For i = 2 To myCount
TheString = Application.Substitute(TheString, TagContents, AbbreviatedTag, 2)
Next i
End If
End If
Next PartString
Sheet1.TextBox2.Text = TheString
End Suband
Sub blah2()
Set Dict = CreateObject("scripting.dictionary")
TheString = Sheet1.TextBox1.Text
Debug.Print TheString
SearchFrom = 1
Do
Posn1 = InStr(SearchFrom, TheString, "<SN>")
Posn2 = InStr(Posn1, TheString, "</SN>")
TagContents = Application.Trim(Mid(TheString, Posn1, Posn2 - Posn1))
If Dict.exists(TagContents) Then
LeftBit = Left(TheString, Posn1 - 1)
RightBit = Mid(TheString, Posn2)
TheString = LeftBit & Dict(TagContents) & RightBit
SearchFrom = Len(LeftBit & Dict(TagContents)) + 6
Else
AbbreviatedTag = Split(TagContents)
AbbreviatedTag(0) = Left(AbbreviatedTag(0), 5) & "."
AbbreviatedTag = Join(AbbreviatedTag)
Dict.Add TagContents, AbbreviatedTag
SearchFrom = Posn2 + 5
End If
Loop Until InStr(SearchFrom, TheString, "<SN>") = 0
Sheet1.TextBox2.Text = TheString
End Sub
If there's no space within a tag one of them puts 2 dots after the first letter, the other a single dot, so one needs some attention.

macropod
12-23-2019, 01:03 PM
It is not only Escherichia Coli but for all scientific names enclosed in <SN> tags.I thought I'd address this because most suggestions haven't.
The code I posted does...

This:

Genus or the first word of the scientific name is only the first letter followed by the period and then the specie name, e.g. <SN>E. Coli</SN>
clearly indicates there is always a Genus and a Species.

snb
12-23-2019, 03:48 PM
The same applies to
http://www.vbaexpress.com/forum/showthread.php?66497-Abbreviate-the-next-succeeding-instances-of-scientific-names&p=397798&viewfull=1#post397798
and
http://www.vbaexpress.com/forum/showthread.php?66497-Abbreviate-the-next-succeeding-instances-of-scientific-names&p=397791&viewfull=1#post397791

p45cal
12-23-2019, 06:11 PM
snb, might you have changed one or both of them since you first posted?
Anyway, starting with:
<SN>Escherichia coli</SN>, is a type of bacteria that normally lives in your intestines. It’s also found in the gut of some animals. Most types of <SN>Escherichia coli</SN> are harmless and even help keep your digestive tract healthy. But some strains can cause diarrhea if you eat contaminated food or drink fouled water. While many of us associate <SN>Blinking unicorn</SN> with food poisoning, you can also get pneumonia and urinary tract infections from different types of the bacteria. In fact, 75% to 95% of urinary tract infections are caused by <SN>Blinking unicorn</SN>. Some versions of <SN>Escherichia coli</SN> make you sick by making a toxin called Shiga. This toxin damages the lining of your intestine. The strains of <SN>Escherichia coli</SN> that make the toxin are sometimes called STEC, which is short for “Shiga toxin-producing <SN>Escherichia coli</SN>.” One especially bad strain, O157:H7, can make you very sick. It causes abdominal cramps, vomiting, and bloody diarrhea. It is the leading cause of acute kidney failure in children.

The OP is expecting:
<SN>Escherichia coli</SN>, is a type of bacteria that normally lives in your intestines. It’s also found in the gut of some animals. Most types of <SN>E. coli</SN> are harmless and even help keep your digestive tract healthy. But some strains can cause diarrhea if you eat contaminated food or drink fouled water. While many of us associate <SN>Blinking unicorn</SN> with food poisoning, you can also get pneumonia and urinary tract infections from different types of the bacteria. In fact, 75% to 95% of urinary tract infections are caused by <SN>B. unicorn</SN>. Some versions of <SN>E. coli</SN> make you sick by making a toxin called Shiga. This toxin damages the lining of your intestine. The strains of <SN>E. coli</SN> that make the toxin are sometimes called STEC, which is short for “Shiga toxin-producing <SN>E. coli</SN>.” One especially bad strain, O157:H7, can make you very sick. It causes abdominal cramps, vomiting, and bloody diarrhea. It is the leading cause of acute kidney failure in children.


your code in msg#10 gives:
Escherichia coli, is a type of bacteria that normally lives in your intestines. It’s also found in the gut of some animals. Most types of <SN>E. coli</SN> are harmless and even help keep your digestive tract healthy. But some strains can cause diarrhea if you eat contaminated food or drink fouled water. While many of us associate <SN>E. coli</SN> with food poisoning, you can also get pneumonia and urinary tract infections from different types of the bacteria. In fact, 75% to 95% of urinary tract infections are caused by <SN>E. coli</SN>. Some versions of <SN>E. coli</SN> make you sick by making a toxin called Shiga. This toxin damages the lining of your intestine. The strains of <SN>E. coli</SN> that make the toxin are sometimes called STEC, which is short for “Shiga toxin-producing <SN>E. coli</SN>.” One especially bad strain, O157:H7, can make you very sick. It causes abdominal cramps, vomiting, and bloody diarrhea. It is the leading cause of acute kidney failure in children.

Problems in blue, tags have disappeared, no B. unicorn at all (abbreviated or not), they've become E. coli.

your code in msg#12 gives:
<SN>Escherichia coli</SN>, is a type of bacteria that normally lives in your intestines. It’s also found in the gut of some animals. Most types of <SN>E. coli</SN> are harmless and even help keep your digestive tract healthy. But some strains can cause diarrhea if you eat contaminated food or drink fouled water. While many of us associate <SN>Blinking unicorn</SN> with food poisoning, you can also get pneumonia and urinary tract infections from different types of the bacteria. In fact, 75% to 95% of urinary tract infections are caused by <SN>Blinking unicorn</SN>. Some versions of <SN>E. coli</SN> make you sick by making a toxin called Shiga. This toxin damages the lining of your intestine. The strains of <SN>E. coli</SN> that make the toxin are sometimes called STEC, which is short for “Shiga toxin-producing <SN>E. coli</SN>.” One especially bad strain, O157:H7, can make you very sick. It causes abdominal cramps, vomiting, and bloody diarrhea. It is the leading cause of acute kidney failure in children.

Blue problem, no abbreviation B. Unicorn.

Paul_Hossler
12-23-2019, 08:02 PM
"Blinking Unicorn" ??? :rofl2::rofl2::rofl2::rofl2::rofl2::rofl2:

snb
12-24-2019, 01:22 AM
To remove all unicorns:


Sub M_snb()
c00 = Sheet1.TextBox1.Text

For Each it In Filter(Split(c00, ">"), "/")
c01 = Left(it, 1) & ". " & Split(it)(1)
If InStr(c00, c01) = 0 Then c00 = Replace(Replace(c00, it, c01), c01, it, , 1)
Next

MsgBox c00
End Sub

p45cal
12-24-2019, 05:08 AM
Finally!
Fingers crossed there aren't different tags with repeating contents.

swaggerbox
12-27-2019, 04:38 AM
Great work guys, notably p45cal. Didn't think this was too complicated.

snb
01-03-2020, 01:16 AM
Was this contribution http://www.vbaexpress.com/forum/showthread.php?66497-Abbreviate-the-next-succeeding-instances-of-scientific-names&p=397826&viewfull=1#post397826 invisible ?