PDA

View Full Version : [SOLVED:] Extract Numeric Substring From longer String



BrI
04-25-2017, 05:09 PM
I have a rather long string, actually a short paragraph, from which I want to extract a numeric string and put it into a new string.

The length of the numeric string will vary but between 7 and 12 digits and it will be the only numeric string of this length within the longer main string.

This will be a recurring scenario that I need to regularly run.

I've been looking through various string manipulation methods but don't see a solution.

Note, although I posted this in Excel thread I want to work with the string in VBA, not interact with a worksheet to resolve the string.

Any assistance?

Paul_Hossler
04-25-2017, 06:37 PM
I'd try something like this




Option Explicit

Sub test()
Dim s As String
s = "asfadsfasdf12345678qwerqwerqwer"
MsgBox ExtractNumbers(s)

s = "12345678qwerqwerqwer"
MsgBox ExtractNumbers(s)
s = "asfadsfasdf12345678"
MsgBox ExtractNumbers(s)
s = "asfadsfasdf123qwerqwerqwer"
MsgBox ExtractNumbers(s)
s = "asfadsfasdf12345678qwer987654321qwerqwer"
MsgBox ExtractNumbers(s)

End Sub


Function ExtractNumbers(s As String) As String ' could be a Long
Dim s1 As String
Dim i As Long

s1 = s

'remove letters at left
Do While Not Left(s1, 1) Like "[0-9]" And Len(s1) > 0
s1 = Right(s1, Len(s1) - 1)
Loop

'remove letters at right
Do While Not Right(s1, 1) Like "[0-9]" And Len(s1) > 0
s1 = Left(s1, Len(s1) - 1)
Loop

If Len(s1) >= 7 And Len(s1) <= 12 And IsNumeric(s1) Then
ExtractNumbers = s1
End If
End Function

BrI
04-25-2017, 07:26 PM
Thank you for that.

I tried an approximation of the string I will be using (below) but got a blank message box. Trying to extract the "50994121"


s = "Form 4 Formule 4 CERTIFICATE OF OWNERSHIP CERTIFICAT ENREGISTRE Land Act,S.N. 1981, c. L-1.1, s.63 Loi sur l'enregistrement foncier, L.N.B. de 1981, chap. L-11, art.33 wer 50994121 Owner Propriétaire"

mikerickson
04-25-2017, 07:33 PM
There are several numerals in the string that in post #3.
What value do you want returned from that string?

BrI
04-25-2017, 07:42 PM
Trying to extract "50994121"
Will always want to extract the number having between 7 and 12 digits

BrI
04-25-2017, 08:04 PM
Another approach might be to find the number between key words as they will always be as below:

"art.33 wer 50994121 Owner Propriétaire"

Or maybe Regex, I have been reading but am just not familiar with it yet.

rlv
04-25-2017, 11:48 PM
There is probably a clever way to do it using regular expressions, but you can also parse the string as a collection of words:



Sub Test()
Dim I As Long, J As Long, WCnt As Long
Dim S As String, WordStr As String, ResultStr As String

S = "Form 4 Formule 4 CERTIFICATE OF OWNERSHIP CERTIFICAT ENREGISTRE Land Act,S.N. 1981, c. L-1.1, s.63 Loi sur l'enregistrement foncier, L.N.B. de 1981, chap. L-11, art.33 wer 50994121 Owner Propriétaire"

ResultStr = ""
WCnt = WordCount(S, " ")
For I = 1 To WCnt
WordStr = ExtractWord(S, " ", I)
J = Len(WordStr)
If J >= 7 And J <= 12 And IsNumeric(WordStr) Then
ResultStr = WordStr
End If
Next I

If ResultStr <> "" Then
MsgBox "Result = " & ResultStr, , "Test Result"
Else
MsgBox "String not found", , "Test Result"
End If
End Sub

Function ExtractWord(ByVal AnyString As String, ByVal WordDelimiters As String, ByVal WordNumber As Long) As String
Dim SA() As String
Dim ResultWord As String

NormalizeDelimiters AnyString, WordDelimiters 'reduce multiple word delimiters to a single delimiter
AnyString = CollapseString(AnyString, WordDelimiters)
SA = Split(AnyString, WordDelimiters)
If WordNumber > 0 And WordNumber <= UBound(SA) + 1 Then
ResultWord = SA(WordNumber - 1)
Else
ResultWord = vbNullString
End If
ExtractWord = ResultWord
End Function


Function WordCount(ByVal AnyString As String, ByVal WordDelimiters As String) As Long
Dim SA() As String

NormalizeDelimiters AnyString, WordDelimiters 'reduce multiple word delimiters to a single delimiter
AnyString = CollapseString(AnyString, WordDelimiters)
SA = Split(AnyString, WordDelimiters)
WordCount = UBound(SA) + 1
End Function


'Replace multiple word delmiters with one word delimiter
Sub NormalizeDelimiters(ByRef AnyString As String, ByRef WordDelimiters As String)
Dim Delimiter As String
Dim SLen As Long
Dim I As Long

SLen = Len(WordDelimiters)
If SLen > 1 Then
If InStr(WordDelimiters, Chr(30)) > 0 Then 'preference for space char
Delimiter = Chr(30)
Else
Delimiter = Left(WordDelimiters, 1)
End If
For I = 1 To SLen
AnyString = VBA.Replace(AnyString, Mid(WordDelimiters, I, 1), Delimiter)
Next I
WordDelimiters = Delimiter
End If
End Sub


'Remove multiple delimiters in any string
Function CollapseString(ByVal AnyString As String, ByVal Delimiter As String) As String
Dim MultiDelims As String
Dim DCnt As Long 'Delimiter count
Dim SPos As Long

For DCnt = 20 To 2 Step -2
MultiDelims = String(DCnt, Delimiter)
SPos = InStr(AnyString, MultiDelims)
If SPos > 0 Then
AnyString = Replace(AnyString, MultiDelims, Delimiter)
End If
Next
CollapseString = AnyString
End Function

p45cal
04-26-2017, 02:30 AM
Another approach might be to find the number between key words as they will always be as below:

"art.33 wer 50994121 Owner Propriétaire"
In that case try:
Sub test()
Dim s As String
s = "Form 4 Formule 4 CERTIFICATE OF OWNERSHIP CERTIFICAT ENREGISTRE Land Act,S.N. 1981, c. L-1.1, s.63 Loi sur l'enregistrement foncier, L.N.B. de 1981, chap. L-11, art.33 wer 50994121 Owner Propriétaire"
MsgBox GetMyNumber(s)
End Sub

Function GetMyNumber(s)
GetMyNumber = Application.Trim(Split(Split(s, " wer ")(1), " Owner ")(0))
End Function





Or maybe Regex, I have been reading but am just not familiar with it yet.…another minefield!

BrI
04-26-2017, 04:28 AM
Really appreciate the responses - both are working.

p45cal - Your function is very concise - which is great of course, and I get the general idea of how it works. But could you briefly explain what is happening, especially with respect to the indexes (1) and (0). Thanks

rlv
04-26-2017, 06:17 AM
At the risk of speaking for p45cal, the Split function returns an array of strings. (0) and (1) refer to specific strings in the zero-based array. Think of it like this:



String1 = Split(s, " wer ")(1)
String2 = Split(String1, " Owner ")(0)
GetMyNumber = Application.Trim(String2)

Paul_Hossler
04-26-2017, 06:34 AM
Thank you for that.

I tried an approximation of the string I will be using (below) but got a blank message box. Trying to extract the "50994121"


s = "Form 4 Formule 4 CERTIFICATE OF OWNERSHIP CERTIFICAT ENREGISTRE Land Act,S.N. 1981, c. L-1.1, s.63 Loi sur l'enregistrement foncier, L.N.B. de 1981, chap. L-11, art.33 wer 50994121 Owner Propriétaire"


You didn't say that the number string would be 'stand alone'

The example string helps

If that's always true, then something like this is simpler




Option Explicit
Sub test()
Dim s As String

s = "Form 4 Formule 4 CERTIFICATE OF OWNERSHIP CERTIFICAT ENREGISTRE Land Act,S.N. 1981, c. L-1.1, s.63 Loi sur l'enregistrement foncier, L.N.B. de 1981, chap. L-11, art.33 wer 50994121 Owner Propriétaire"

MsgBox ExtractNumbers(s)

End Sub


Function ExtractNumbers(s As String) As String ' could be a Long
Dim v As Variant

Dim i As Long

v = Split(s, " ")

For i = LBound(v) To UBound(v)
If Len(v(i)) >= 7 And Len(v(i)) <= 12 And IsNumeric(v(i)) Then
ExtractNumbers = v(i)
Exit Function
End If
Next I

ExtractNumbers = vbNullString
End Function

BrI
04-26-2017, 06:54 AM
Thanks to everyone, solved a number of ways - good to see the options. And explained as well -- great!!

mdmackillop
04-26-2017, 02:33 PM
Never answered with a regexp before. This should work if the string is a "word"

Sub test()
Dim strPattern As String
strPattern = "\b[0-9]{7,12}\b"
MsgBox RegxFunc([A1], strPattern)
End Sub


Function RegxFunc(strInput As String, regexPattern As String) As String
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = regexPattern
End With


If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
RegxFunc = matches(0).Value
Else
RegxFunc = "not matched"
End If
End Function