PDA

View Full Version : [SOLVED] Extract numbers from texts and do substraction



Dreamer
08-02-2005, 05:18 PM
Hi,

If I'm given the following lines...


18-19 Years,
20-24 Years,
25-29 Years,
30-34 Years,
35-39 Years,
40-44 Years,
45-49 Years,

I want to let VBA to read the numbers only,
18-19, 20-24,25-29....45-49
find the mid-point for each group,
that's 18.5, 22, 27, ..47.

Any suggestions?
Thanks in advance.

xCav8r
08-02-2005, 05:35 PM
There are a number of ways you can approach this. I'd divide it up into two parts. First, get the numbers out. Second, perform the calculation.

When getting the numbers out, if the format in which you'll be given this information will never change, then you can easily use left and mid to get out the years. If you want to make something that will be more adaptable, you might consider looping through the characters in the string checking where numeric characters start and end.

For the calculation, just use the Average function.

Justinlabenne
08-02-2005, 06:42 PM
Not sure how you wanted these number returned and if the format is the same every time:
IE: 18-19 Years,

but this assumes a lot, that your text is in column A, and it can be put into columns B,C,D


Option Explicit

Sub Extract()
Dim x As Long
x = Cells(Rows.Count, 1).End(xlUp).Row
Dim r As Range
For Each r In Range("A1:A" & x)
With r.Offset(, 1)
.Formula = Left$(r, 2)
.Value = .Value
End With
With r.Offset(, 2)
.Formula = Mid$(r, 4, 2)
.Value = .Value
End With
With r.Offset(, 3)
.Formula = WorksheetFunction.Average(r.Offset(, 1), r.Offset(, 2))
.Value = .Value
End With
Next r
End Sub

brettdj
08-02-2005, 07:26 PM
If your data set was long I'd use a variant array to perform the maniupulation and then dump to the next row. For now I've stuck with a simple For loop through the range

This code also assumes that the data starts in A1 and it dumps the average to B. If the XX-XX pattern is not fully met then no average is returned



Option Explicit

Sub gETmID()
Dim Myrange As Range, C As Range
Dim RegEx As Object
Set Myrange = Range([a1], [a65536].End(xlUp))
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Pattern = "(\d+)-(\d+).*"
.Global = True
End With
For Each C In Myrange
If RegEx.test(C) = True Then C.Offset(0, 1) = Application.Average(RegEx.Replace(C, "$1"), RegEx.Replace(C, "$2"))
Next
Set RegEx = Nothing
End Sub

Cheers

Dave

Dreamer
08-02-2005, 08:59 PM
Hi,

Thanks to all!

I hv a little problem here..
When I try following lines with dollar value,

500,000-550,000,
450,000-500,000,
400,000-450,000,
350,000-400,000,
300,000-350,000,
275,000-300,000,
250,000-275,000,
225,000-250,000,
200,000-225,000,

The result comes out as:
500275
450250
400225
350200
300175
275150
250137.5
225125
200112.5

How to solve it if the dollar value contains "," and I can't delete it?

Also, when I deal with following lines,
US 500000-550000,
US 450000-500000,
US 400000-450000,
US 350000-400000,
US 300000-350000,
US 275000-300000,
US 250000-275000,

It doesn't work if I change this ..


.Pattern = "*.(\d+)-(\d+).*"

Please advise, thank you so much...:)

xCav8r
08-02-2005, 09:20 PM
Well, ya edited your previous post, so this is based on the last values. Stick this in a standard module, and then you can within a cell do something like this:


= GetAverageFromString(A1)


Option Explicit

Function ExtractNumbersFromString(InputString As String) As Variant
'Returns a single dimension array with all the numbers found in a string
'does not support decimals
Dim avarNumbers() As Variant
Dim strThisCharacter As String
Dim strNextCharacter As String
Dim intNumericStart As Integer
Dim intNumericEnd As Integer
Dim intCharacterCounter As Integer
Dim intStringLength As Integer
Dim blnNumberFound As Boolean
Dim intArraySize As Integer
Dim sngCurrentNumber As Single
intStringLength = Len(InputString)
blnNumberFound = False
intNumericStart = 0
intNumericEnd = 0
intArraySize = 0
For intCharacterCounter = 1 To intStringLength
Select Case intCharacterCounter
Case 1
strThisCharacter = Mid(InputString, intCharacterCounter, 1)
strNextCharacter = IIf(intStringLength > 1, Mid(InputString, intCharacterCounter + 1, 1), "")
Case intStringLength
strThisCharacter = Mid(InputString, intCharacterCounter, 1)
strNextCharacter = ""
Case Else
strThisCharacter = Mid(InputString, intCharacterCounter, 1)
strNextCharacter = Mid(InputString, intCharacterCounter + 1, 1)
End Select
Select Case blnNumberFound
Case True
If IsNumeric(strNextCharacter) = False Then intNumericEnd = intCharacterCounter
Case False
If IsNumeric(strThisCharacter) = True Then
intNumericStart = intCharacterCounter
If IsNumeric(strNextCharacter) = False Then intNumericEnd = intCharacterCounter
End If
End Select
If intNumericStart <> 0 And intNumericEnd <> 0 Then
If intArraySize = 0 Then
ReDim Preserve avarNumbers(intArraySize)
sngCurrentNumber = Mid(InputString, intNumericStart, intNumericEnd - intNumericStart + 1)
avarNumbers(intArraySize) = sngCurrentNumber
intArraySize = intArraySize + 1
Else
ReDim Preserve avarNumbers(intArraySize)
sngCurrentNumber = Mid(InputString, intNumericStart, intNumericEnd - intNumericStart + 1)
avarNumbers(intArraySize) = sngCurrentNumber
End If
blnNumberFound = False
intNumericStart = 0
intNumericEnd = 0
ElseIf intNumericStart = 0 And intNumericEnd = 0 Then
blnNumberFound = False
ElseIf intNumericStart <> 0 And intNumericEnd = 0 Then
blnNumberFound = True
End If
Next intCharacterCounter
ExtractNumbersFromString = avarNumbers
End Function

Function GetAverageFromString(InputString As String) As Single
Dim avarNumbers As Variant
Dim intArraySize As Integer
Dim sngSum As Single
Dim intCounter As Integer
InputString = Replace(InputString, ",", "", 1, -1, vbTextCompare)
avarNumbers = ExtractNumbersFromString(InputString)
sngSum = 0
intArraySize = UBound(avarNumbers)
For intCounter = 0 To intArraySize
sngSum = sngSum + avarNumbers(intCounter)
Next intCounter
GetAverageFromString = sngSum / (intArraySize + 1)
End Function

Sub test()
Debug.Print GetAverageFromString("US 500000-550000,US 450000-500000,US" _
& "400000-450000,US 350000-400000,US 300000-350000,US 275000-300000,US 250000-275000,")
Debug.Print GetAverageFromString("500,000-550,000,450,000-500,000," _
& " 400,000-450,000,350,000-400,000,300,000-350,000,275,000-300,000,250,000-275,000,225,000-250,000,200,000-225,000,")
End Sub


Results are...

387500
362500

xCav8r
08-02-2005, 09:24 PM
Actually, that's not gonna work since I scrub out all the commas. Can you get rid of the commas on your end? It's a problem that they're used both to delimit ### and numbers themselves. Without knowing the exact formatting, I don't know how to tell when a comma is being used for which purpose. If you give me all the formatting possibilities, then hopefully I can code for it. If, however, commas will be used to delineate three digits and to separate numbers (with no spacing or other consistent formatting), then I don't know what to do.

Can you provide exhaustive examples of the types of strings that you need to deal with?

Dreamer
08-03-2005, 12:01 AM
Hi,

I can get rid of the comma at the end of the string but have to keep the dollar signs..
There are 2 formats most often use: With years/With $ amount

For dollar values, two types 1)US 500000-550000 & 2) 500,000-550,000 are commonly used.

Bob Phillips
08-03-2005, 03:21 AM
How about a worksheet formula


=IF(LEFT(A1,2)="US",(MID(A1,FIND(" ",A1),FIND("-",A1)-FIND(" ",A1))+RIGHT(A1,LEN(A1)-FIND("-",A1))),(LEFT(A10,FIND("-",A10)-1)+RIGHT(A10,LEN(A10)-FIND("-",A10))))/2

xCav8r
08-03-2005, 07:14 AM
Hi,

I can get rid of the comma at the end of the string but have to keep the dollar signs..
There are 2 formats most often use: With years/With $ amount

For dollar values, two types 1)US 500000-550000 & 2) 500,000-550,000 are commonly used.

What I wrote will pull every number out of a string. If you've gotten rid of the commas at the end of the numbers, then it will work without problems (so I think).

Here's how it works: If a number is touching another number, I assume it's a digit of larger number. If not, it's a single digit number. So if you plugged in the last line you wrote, it would spit out:

1
500000
550000
2
500000
550000
Then it would average these numbers. Obviously, that wouldn't work because 1 and 2 aren't supposed to be in there. I can filter things like that out, but only when I know that anything numeric that's touching a ) should be disregarded. I just use that as an example. As it stands now, I don't know all the ways in which numbers might be in your strings, so I don't know how to filter out undesired numbers.

Dreamer
08-04-2005, 07:53 AM
Hi,

Thanks a lot! So impressive..
I think it really helps me a lot... Thank you :)

Bob Phillips
08-04-2005, 08:20 AM
Hi,

Thanks a lot! So impressive..
I think it really helps me a lot... Thank you :)

which one?

xCav8r
08-10-2005, 09:12 PM
Good question, xld. Dreamer, for people searching this forum with a similar problem, it would be helpful to know which solution you implemented. I assume it was mine. Muhahahaa!