PDA

View Full Version : Solved: Extracting 5 or 6 digit numbers from a cell



davidboutche
10-23-2009, 03:54 AM
The problem:
I have one column of data containing words and variable lengths of sets of numbers.

What I want:
I want to extract all the five or six digit numbers from that cell and place them individually into the proceeding columns

Then I want to count the total number of times each of those numbers appear.
Ultimately creating a top ten list of numbers featuring in the text.

There are other numbers in the text, but they relate to times and dates so all I want is the 5 and 6 digit numbers.

I think I should be using the instring() and len() functions to do this but I'm really not sure how to use them that well.

I've attached a sample book if anyone can help please.

David

Bob Phillips
10-23-2009, 04:19 AM
That looks a bit tricky, what are the rules? The second line has two 6 digit numbers.

mdmackillop
10-23-2009, 05:36 AM
Before we start analysing, test this to extract your numbers.

Option Explicit
Sub LongNos()
Dim i As Long, txt, y As String, cel As Range, x As Long


For Each cel In Selection
i = 1
txt = Split(cel)
For x = 0 To UBound(txt)
y = txt(x)
If IsNumeric(y) And (Len(y) = 5 Or Len(y) = 6) Then
cel.Offset(, i) = y
i = i + 1
End If
Next
Next
End Sub

davidboutche
10-23-2009, 06:02 AM
yes that worked super. Although I had to highlight all cells i think to make it work on all cells i think. But I believe this can be got around using the an offset?

Next task is to count the repetitions?

mbarron
10-23-2009, 06:14 AM
Try this:

Sub Get5or6()
Dim oCol As Long, i As Integer, rData As Range, aData() As String
Dim j As Long, k As Long
oCol = 1

For Each rData In Selection
aData = Split(rData, " ")
For i = 0 To UBound(aData)
If IsNumeric(aData(i)) And (Len(aData(i)) = 5 Or Len(aData(i)) = 6) Then
Cells(oCol, 2) = aData(i)
oCol = oCol + 1
End If
Next
Next


For j = 1 To oCol - 1
Cells(j, 3) = Application.WorksheetFunction.CountIf(Range(Cells(1, 2), Cells(oCol - 1, 2)), Cells(j, 2))
Next

Range(Cells(1, 2), Cells(oCol - 1, 3)).Sort Key1:=Range("C1"), Order1:=xlDescending, Key2:=Range("B1"), _
Order2:=xlAscending, Header:=xlGuess, OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom, DataOption1:=xlSortNormal, DataOption2:=xlSortNormal

For k = oCol - 1 To 2 Step -1
If Cells(k - 1, 2) = Cells(k, 2) Then
Range(Cells(k, 2), Cells(k, 3)).Delete shift:=xlUp
End If
Next


End Sub

p45cal
10-23-2009, 08:17 AM
Seems we all had a go at this one. See attached and here's the code:
Sub blah()
'make sure the right sheet is the active sheet
Set Tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If (Len(word) = 5 Or Len(word) = 6) And IsNumeric(word) Then
If Tally.exists(word) Then
Tally(word) = Tally(word) + 1
Else
Tally.Add word, 1
End If
End If
Next word
Next cll
myCount = Tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(After:=Worksheets(Worksheets.Count))
NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(Tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
'the rest is optional:
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, _
Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox msg
End SubIt adds a new sheet with the results, there's a button to click in the attached.
It does rely on the data being delimited by spaces.

I take it that you don't really need the bit in bold here:"all the five or six digit numbers from that cell and place them individually into the proceeding columns"

davidboutche
10-23-2009, 09:02 AM
barron when i try your sub with my sheet, i get the error

Named Argument Not Found
When I debug it, it is highlighting
DataOption1:=]

Any ideas there? I'm using excel 2000 if that makes any difference.

The other sub written by p45cal (http://www.vbaexpress.com/forum/member.php?u=3494) vbmenu_register("postmenu_197606", true); - I tried it on the attached workbook and although it does seem to work it returns the error

application - defined or object - defined error

When debugged it highlights
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal

Felt very close but didn't quite work. I did try the sub on my own sheet with the real data but it returned a different error. Just for info, my own sheet contains 63000 rows.


I take it that you don't really need the bit in bold here:"all the five or six digit numbers from that cell and place them individually into the proceeding columns"

Quite right, that was only to break down the process of what i was trying to achieve.

p45cal
10-23-2009, 09:08 AM
I think if you were to delete everything (including the comma) after
xlTopToBottom
in that sorting multiline, it might work in xl2000.

It was changed in xl2002, and sorting's changed again in xl2007.

davidboutche
10-23-2009, 01:41 PM
worked that time without the xltop.....
works with your data fine
works with most of my data fine
if I cut a load of cells from my sheet into yours it works well, if i put all the data in falls over, i think from memory the error was type mismatch. Is there some data in my sheet that isn't in the right format maybe? it could be difficult finding it through 63000 rows.

I'm not with that data till next week but I'm very curious in the mean time as to what the problem might be.

If anyone could explain to me what goes on in the above code that would be really helpful too. I don't just want to copy someone else's code without knowing how to use it myself next time.

p45cal
10-23-2009, 03:56 PM
re:"worked that time without the xltop....."
actually, xltoptobottom should remain leaving:
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottomyes?

I've just remembered something.. in xl2000/2002, there was a real gotcha bug; if there's any conditional formatting on the sheet that's being processed, that can interfere with the split function, so remove all conditional formatting. You can find cells with conditional formatting by pressing F5 on the keyboard, clicking the Special.. button, choosing Conditional formats and All, click OK. Then if there are cells selected, go to the dropdown menus, choose Format|Conditional formatting..., click the Delete.. button, check all the 3 conditions, click OK.

Try again.

davidboutche
10-23-2009, 04:20 PM
Sorry, yes, that was what I did.... i was writing in a hurry before leaving work.

davidboutche
10-28-2009, 07:53 AM
Using the following code it seems to be doing it's job really well although it does seem to pick out the occassional 4 digit number, although that's not a big problem.

Sub blah()
'makes sure the right sheet is the active sheet
Set Tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If (Len(word) = 5 Or Len(word) = 6) And IsNumeric(word) Then
If Tally.exists(word) Then
Tally(word) = Tally(word) + 1
Else
Tally.Add word, 1
End If
End If
Next word
Next cll
myCount = Tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(After:=Worksheets(Worksheets.Count))
NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(Tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, Choose(NewWs.Cells(i, 2), _
"once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox msg
End Sub
What it's getting stuck on is the size of the data processed. If I limit it to chunks of 3000 rows it seems fine, but when I give it anything over about 4 or 5000 cells it returns a 'Run Time error 13': Type Mismatch

The debugger highlights:
NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.keys)
If i hover over mycount it shows 31499 every time the procedure is called.
I am working with 34035 cells of data

Is this possibly memory allocation limit ?

mdmackillop
10-28-2009, 09:39 AM
UsedRange could be excessive. Check for the last used row in column 1 and use that to define the range, rather than the Intersect method

davidboutche
10-28-2009, 09:47 AM
I think I understand what you're saying. The last cell is in column A row 49221..

What part of the code should I try changing to include this?

mdmackillop
10-28-2009, 09:49 AM
You might try this to eliminate 4 character items; also Dim Word as string, to avoid problems with leading 0s. These might be ignored in the code as written.


If (Len(trim(word)) = 5 Or Len(trim(word)) = 6) And IsNumeric(trim(word)) Then

mdmackillop
10-28-2009, 09:49 AM
I think I understand what you're saying. The last cell is in column A row 49221..

What part of the code should I try changing to include this?


For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))

davidboutche
10-28-2009, 10:05 AM
It's a bit of a stab in the dark, but are you stalking about something like this?

For Each cll In ActiveSheet.range(49221, Columns(1))

Sorry, i'm really quite new at this level of vba

mdmackillop
10-28-2009, 10:27 AM
This is typically how it is done


For Each cll In Range(Cells(1, 1), Cells(Rows.Count, 1).End(xlUp))

davidboutche
10-28-2009, 10:37 AM
Im afraid that gave the same error. Runtime error 13 Type mismatch highlighting
NewWs.Range("A1").Resize(mycount) = Application.Transpose(Tally.keys)

p45cal
10-28-2009, 12:58 PM
You might try this to eliminate 4 character items; also Dim Word as string, to avoid problems with leading 0s. These might be ignored in the code as written.


If (Len(trim(word)) = 5 Or Len(trim(word)) = 6) And IsNumeric(trim(word)) Then



I hope there aren't any instances of word having trailing/leading spaces.. the split function uses a space to split the data.If you have multiple spaces in the string being split, you get several "" in the resulting array - and these will fail the len test.

p45cal
10-28-2009, 02:02 PM
[snip]
What it's getting stuck on is the size of the data processed. If I limit it to chunks of 3000 rows it seems fine, but when I give it anything over about 4 or 5000 cells it returns a 'Run Time error 13': Type Mismatch

[snip]
Is this possibly memory allocation limit ?
I doubt it. Here's the code again with a bit of debugging code added to see if there are unexpected characters being counted as numeric - that bit of code can be deleted after it's done its job. It will considerably slow it down. Report back here!
Sub blah()
'make sure the right sheet is the active sheet
Set Tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
'cll.Select
x = Split(cll.Value, " ")
For Each word In x
If (Len(word) = 5 Or Len(word) = 6) And IsNumeric(word) Then
'temp debug lines follow:
For i = 1 To Len(word)
If InStr("1234567890", Mid(word, i, 1)) = 0 Then
Application.Goto cll
MsgBox "xx" & word & "xx character " & i & " unexpected ('" & Mid(word, i, 1) & "') _
in the contents of the cell:" & vbLf & cll.Value & vbLf * "in row " & cll.Row
Stop 'press F5 on the keyboard to continue after noting why it stopped..
End If
Next i
'end of debug lines.
If Tally.exists(word) Then
Tally(word) = Tally(word) + 1
Else
Tally.Add word, 1
End If
End If
Next word
Next cll
myCount = Tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(After:=Worksheets(Worksheets.Count))
NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(Tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, _
Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox msg
End Sub

mdmackillop
10-28-2009, 02:14 PM
I hope there aren't any instances of word having trailing/leading spaces.. the split function uses a space to split the data.If you have multiple spaces in the string being split, you get several "" in the resulting array - and these will fail the len test.
I agree, but I don't see what else might cause 4 figure results.

GTO
10-28-2009, 11:18 PM
Howdy to all,

Say, while I am certainly not knowledgeable at all about regular expressions, I was thinking that it might be a way not to worry about spaces etc. If I have this right :nervous: , it should only pick up 5 to six digits, using word boundary at each end.

Option Explicit

Sub Numbers_RetAndCnt()
Dim REX As Object '<--- RegExp
Dim DIC As Object '<--- Dictionary
Dim rexMatch As Object '<--- RegExp~Match
Dim rexMatchCollection As Object '<--- RegExp~MatchCollection
Dim aryValues As Variant
Dim aryKeys As Variant
Dim aryItems As Variant
Dim i As Long

Set REX = CreateObject("VBScript.RegExp")
Set DIC = CreateObject("Scripting.Dictionary")
DIC.CompareMode = TextCompare
REX.Global = True
'// Hopefully: a word boundary, followed by 5-6 digits, followed by a word boundary //
REX.Pattern = "\b\d{5,6}\b"

With ThisWorkbook.Worksheets("Sheet1")
aryValues = .Range("A1:A" & .Cells(Rows.Count, 1).End(xlUp).Row).Value
End With

For i = LBound(aryValues, 1) To UBound(aryValues, 1)
If REX.Test(aryValues(i, 1)) Then
Set rexMatchCollection = REX.Execute(aryValues(i, 1))
For Each rexMatch In rexMatchCollection
With DIC
If Not .Exists(rexMatch.Value) Then
.Add rexMatch.Value, 1
Else
.Item(rexMatch.Value) = .Item(rexMatch.Value) + 1
End If
End With
Next
End If
Next

ReDim aryValues(1 To DIC.Count, 1 To 2)
aryKeys = DIC.Keys
aryItems = DIC.Items

For i = 1 To DIC.Count
aryValues(i, 1) = aryKeys(i - 1)
aryValues(i, 2) = aryItems(i - 1)
Next

ThisWorkbook.Worksheets("Sheet1").Range("D1").Resize(DIC.Count, 2).Value = aryValues
End Sub


Hope that helps,

Mark

davidboutche
10-29-2009, 07:01 AM
I've made some progress to this. I've changed to using a machine that runs office 2003. The original code supplied by P45CAL is now working mostly. It is occassionaly picking up 4 digit numbers though.

The code looks like this
Sub blah()
'Dim word As String
'makes sure the right sheet is the active sheet
Set Tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If (Len(Trim(word)) = 5 Or Len(Trim(word)) = 6) And IsNumeric(Trim(word)) Then
If Tally.exists(word) Then
Tally(word) = Tally(word) + 1
Else
Tally.Add word, 1
End If
End If
Next word
Next cll
myCount = Tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(After:=Worksheets(Worksheets.Count))
NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(Tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, _
Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox msg
End Sub
If I include the 'dim word as string' as suggested by MD, I get the error 'For Each control must be variant or object' stopping on the following line
For Each word In x
The four digit numbers that is it is picking up, do seem to lead with a zero.

The five and six digit numbers i'm trying to work with, normally have a space before and after them. Very occassionally they may be preceded by some letters but if these are not included it's not a problem. It won't affect the greater process.

mdmackillop
10-29-2009, 11:04 AM
Go back to my Post #3 and try that method.

GTO
10-29-2009, 04:35 PM
...The four digit numbers that is it is picking up, do seem to lead with a zero.

The five and six digit numbers i'm trying to work with, normally have a space before and after them. Very occassionally they may be preceded by some letters but if these are not included it's not a problem. It won't affect the greater process.

Greetings David,

Say, could you post a sample workbook with a larger amount of the offending data? Especially to include some of the lines where the four-digit numbers are accidently caught up and some of the lines where five or six-digit numbers are missed because of an alpha preface (if I read your post correctly).

Thanks,

Mark

p45cal
10-29-2009, 06:37 PM
The four digit numbers that is it is picking up, do seem to lead with a zero.

Leading/trailing zeroes do indeed disappear, so 00050 and 000050 would both appear as 50, but this would appear twice with a count for each.

Adding this line:
NewWs.Range("A1").Resize(myCount).NumberFormat = "@"
directly before this existing line:

NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.keys)shows all leading zeroes properly - so hopefully no more numbers with less than 5 digits.

davidboutche
11-02-2009, 03:00 AM
Thanks all for the help. I've added
NewWs.Range("A1").Resize(myCount).NumberFormat = "@"
and now I believe it works only showing five and 6 digit numbers. The error seemed to be my interpretation. I hadn't realised the 5 or 6 digit numbers start with 0 or 00 were shortened to their numberical value.

I've still not managed to get the code to work on machines running office 2000 when dealing with the larger data content but I do have access to 2003 on which the code seems to work perfectly. I should be in a position to test this wednesday when I will mark the thread as solved.

Thanks again

davidboutche
11-05-2009, 05:10 AM
The problem has been solved with the following code with special thanks to P45CAL.

Sub blah()
'makes sure the right sheet is the active sheet
Set Tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If (Len(Trim(word)) = 5 Or Len(Trim(word)) = 6) And IsNumeric(Trim(word)) Then
If Tally.exists(word) Then
Tally(word) = Tally(word) + 1
Else
Tally.Add word, 1
End If
End If
Next word
Next cll
myCount = Tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(After:=Worksheets(Worksheets.Count))
NewWs.Range("A1").Resize(myCount).NumberFormat = "@"
NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(Tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox msg
End Sub

This small piece of code has saved literally a life time of analitical work.

It works perfectly on officer 2003

Thanks to all involved

davidboutche
11-12-2009, 09:47 AM
Whilst I appreciate this thread is officially soved, the project has progressed slightly and any help would be appreciated.

Rather than looking at a pre determined length of string to add to the tally, I want to look at number plates of cars. I want the expression If Len(Trim(word)) = 7 Then to be replaced with an exppression that looks at any word uses it but only if it contains and number and an alpha character at any place in.

I have a feeling I should be using something like if instring(word)containsalpha and instring(word)contains numeric then

am I going down the right route with this?

mdmackillop
11-12-2009, 10:13 AM
Try

If Selection Like "[A-Z]*[0-9]" Or Selection Like "[0-9]*[A-Z]" Then
MsgBox "Car Reg"
End If

davidboutche
11-12-2009, 10:30 AM
That's great, I changed it slightly toIf ((word Like "[A-Z]*[0-9]" Or word Like "[0-9]*[A-Z]") And (Len(Trim(word)) = 7)) Then

This seems to pull out all 7 digit strings with numbers and characters which gets most number plates in the UK.

It seems to work, is the way I've writted correct?