PDA

View Full Version : Maximum limits



davidboutche
01-22-2010, 02:38 AM
I'm using the following code to extract words from cells of data and put them into a tally with a count for each one and then output them to a new worksheet. On smaller number of cells it works fine but when I use in excess of 40,000 cells it has problems.

I'm guessing this is possibly because it exceeds the maximum count for a tally or the maximum cell for excel 2003.

Sub freewordcruncher()
'makes sure the right sheet is the active sheet
Set Tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If Tally.exists(word) Then
Tally(word) = Tally(word) + 1
Else
Tally.Add word, 1
End If
'End If
Next word
Next cll
myCount = Tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(after:=Worksheets(Worksheets.Count))
'NewWs.Range("A1").Resize(myCount).NumberFormat = "@"
NewWs.Range("A1").Resize(myCount) = Application.Transpose(Tally.Keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(Tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox "Let's look at links, here's your top ten!" & vbLf & msg
End Sub


How can I put a condition in so it will only write to the new worksheet tally items that have a count of say 5 or higher?

Bob Phillips
01-22-2010, 04:13 AM
Can you post a workbook, it is difficult to test the scaenario without one.

davidboutche
01-22-2010, 04:42 AM
Here is the work book. Sadly I can't include the data. Its about 40 to 50,000 cells each containing a sentence. I don't know if you can replicate that with dummy data?

p45cal
01-30-2010, 08:44 AM
With that number of cells then you're likely to exceed the number of rows available in xl2003 to display the results.
Between Next cll and
myCount = tally.Count add this code:
For Each Key In tally.keys
If tally(Key) < 100 Then tally.Remove (Key)
Next Key where the 100 is the count below which you want to eliminate results.

mdmackillop
01-30-2010, 09:28 AM
Here's a sample file with some data.

Use the Error function of the Dictionary object to limit Tally to different values. Use the Item value to keep count as you go.
The latter part of your code can be simplified accordingly.



Sub freewordcruncher()
digits = 16
'makes sure the right sheet is the active sheet
On Error Resume Next
Set tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
tally.Add word, 1
If Err <> 0 Then
tally.Item(word) = tally.Item(word) + 1
End If
'End If
Next word
Next cll
myCount = tally.Count
Set newws = ActiveWorkbook.Sheets.Add(after:=Worksheets(Worksheets.Count))
newws.Activate
'NewWs.Range("A1").Resize(myCount).NumberFormat = "@"
newws.Range("A1").Resize(myCount) = Application.Transpose(tally.Keys)
newws.Range("B1").Resize(myCount) = Application.Transpose(tally.Items)
newws.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
newws.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = newws.Cells(i, 2)
msg = msg & vbLf & newws.Cells(i, 1) & " occurs " & IIf(newws.Cells(i, 2) < 3, Choose(newws.Cells(i, 2), "once", "twice"), newws.Cells(i, 2) & " times")
Next i
MsgBox "Let's look at links, here's your top ten!" & vbLf & msg
End Sub

davidboutche
02-01-2010, 09:20 AM
Both worked great thanks. Now as you can imagine there are a lot of words I'm simply not interested in.. such as 'a' 'and' 'of' 'if' etc. I've got around this by adapting the code to only allow words over 3 digits in length.

Sub freewordcruncher()
digits = 3
'makes sure the right sheet is the active sheet
Set tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If Len(Trim(word)) > digits Then
If tally.exists(word) Then
tally(word) = tally(word) + 1
Else
tally.Add word, 1
End If
End If
Next word
Next cll
For Each Key In tally.keys
If tally(Key) < 100 Then tally.Remove (Key)
Next Key
myCount = tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(after:=Worksheets(Worksheets.Count))
'NewWs.Range("A1").Resize(myCount).NumberFormat = "@"
NewWs.Range("A1").Resize(myCount) = Application.Transpose(tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox "Let's look at links, here's your top ten!" & vbLf & msg
End Sub


What I want to do now is tell it to exlcude from the tally a list of predefined words.

Could I do something like:
If Len(Trim(word)) > digits and <> "note" Then


I'm not sure if i've expressed that properly or not but I think you'll get the idea. But even if I have got it right i imagine it would be very poor coding to keep writing in OR OR OR. Any ideas please?

mdmackillop
02-01-2010, 10:36 AM
Better to maintain a dynamic range of words which can be added to as required.

Dim Ignore as Range
Set Ignore = Range("Ignorelist").Find(word, lookat:=xlWhole)
If Ignore Is Nothing Then
tally.Add word, 1
If Err <> 0 Then
tally.Item(word) = tally.Item(word) + 1
End If
End If

davidboutche
02-02-2010, 07:54 AM
The next thing I'm trying to do is look at post codes. I'm having a little trouble using the conditions though. I've set it to only use words of 6 digits. I've used a function written by pascal to only use words that contain letters and numbers and then I tried to adapt that to another function to look at the first three characters in the string on the pretence that all uk postcodes start letter letter number.

I got it to work identifying the fist character but I'm not sure where I'm going wrong with using the mid function.

Are there any obvious errors below?

Sub postcodecruncherfree()
'makes sure the right sheet is the active sheet
Set tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If Len(word) = 6 And Lookslikepostcode(word) And ContainsLettersAndNumbers(word) Then
If tally.exists(word) Then
tally(word) = tally(word) + 1
Else
tally.Add word, 1
End If
End If
Next word
Next cll
myCount = tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(after:=Worksheets(Worksheets.Count))
NewWs.Range("A1").Resize(myCount) = Application.Transpose(tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox "Let's look at links, here's your top ten!" & vbLf & msg
End Sub
Function ContainsLettersAndNumbers(TheString) As Boolean
For i = 1 To Len(TheString)
Z = Asc(Mid(TheString, i, 1))
If Z > 47 And Z < 58 Then ContainsNumber = True
If (Z > 64 And Z < 91) Or (Z > 96 And Z < 123) Then ContainsLetter = True
If ContainsLetter And ContainsNumber Then
ContainsLettersAndNumbers = True
Exit For
End If
Next i
End Function
Function Lookslikepostcode(TheString) As Boolean
startletter = Asc(Left(TheString, 1))
If (startletter > 64 And startletter < 91) Or (startletter > 96 And startletter < 123) Then startpost = True
If Len(TheString) > 2 Then
secondletter = Asc(Mid(TheString, 2, 1))
If (secondletter > 64 And secondletter < 91) Or (secondletter > 96 And secondletter < 123) Then secondpost = True
End If
If startpost = True And secondletter = True Then

Lookslikepostcode = True


End If

End Function

I'm getting a type mismatch error

p45cal
02-02-2010, 08:06 AM
Quick response:
try changing:
If startpost = True And secondletter = True Then to:
If startpost = True And secondpost = True Then or even shorter:If startpost And secondpost Then but two points, (1) it doesn't check the 3rd character for a number (I suppose you've not got that far yet) and (2) I've been looking at regular expressions lately, and they're powerful. If you're going to do a lot of this kind of stuff it may be worth exploring them.

davidboutche
02-02-2010, 08:38 AM
pascal, i thought that what you were suggesting would have solved the obvious mistake, but it hasn't which confuses me.

The code is now
Sub postcodecruncherfree()
'makes sure the right sheet is the active sheet
Set tally = CreateObject("Scripting.Dictionary")
For Each cll In Intersect(ActiveSheet.UsedRange, Columns(1))
x = Split(cll.Value, " ")
For Each word In x
If Len(word) = 6 And Lookslikepostcode(word) And ContainsLettersAndNumbers(word) Then
If tally.exists(word) Then
tally(word) = tally(word) + 1
Else
tally.Add word, 1
End If
End If
Next word
Next cll
myCount = tally.Count
Set NewWs = ActiveWorkbook.Sheets.Add(after:=Worksheets(Worksheets.Count))
NewWs.Range("A1").Resize(myCount) = Application.Transpose(tally.keys)
NewWs.Range("B1").Resize(myCount) = Application.Transpose(tally.Items)
NewWs.UsedRange.Sort Key1:=Range("B1"), Order1:=xlDescending, Header:=xlNo, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
NewWs.Rows(1).Insert
Range("A1:B1") = Array("Number", "Count")
msg = "Results:" & vbLf
For i = 2 To Application.Min(myCount, 10) + 1 ' the 10 here is for the top 10
zz = NewWs.Cells(i, 2)
msg = msg & vbLf & NewWs.Cells(i, 1) & " occurs " & IIf(NewWs.Cells(i, 2) < 3, Choose(NewWs.Cells(i, 2), "once", "twice"), NewWs.Cells(i, 2) & " times")
Next i
MsgBox "Let's look at links, here's your top ten!" & vbLf & msg
End Sub
Function ContainsLettersAndNumbers(TheString) As Boolean
For i = 1 To Len(TheString)
Z = Asc(Mid(TheString, i, 1))
If Z > 47 And Z < 58 Then ContainsNumber = True
If (Z > 64 And Z < 91) Or (Z > 96 And Z < 123) Then ContainsLetter = True
If ContainsLetter And ContainsNumber Then
ContainsLettersAndNumbers = True
Exit For
End If
Next i
End Function
Function Lookslikepostcode(TheString) As Boolean
startletter = Asc(Left(TheString, 1))
If (startletter > 64 And startletter < 91) Or (startletter > 96 And startletter < 123) Then startpost = True
If Len(TheString) > 2 Then
secondletter = Asc(Mid(TheString, 2, 1))
If (secondletter > 64 And secondletter < 91) Or (secondletter > 96 And secondletter < 123) Then secondpost = True
End If
If startpost And secondpost Then

Lookslikepostcode = True

End If

End Function

Im still getting the type mismatch error...

p45cal
02-02-2010, 08:57 AM
Im still getting the type mismatch error...

Which line is giving this (usually highlighted in yellow?)

davidboutche
02-02-2010, 09:26 AM
Which line is giving this (usually highlighted in yellow?)

this one:
NewWs.Range("A1").Resize(myCount) = Application.Transpose(tally.keys)

Although it only seems to happen when the second letter condition is applied?

mdmackillop
02-02-2010, 10:18 AM
There is a RegExp solution here (http://www.mgbrown.com/PermaLink66.aspx). Check our Dave Brett's KB article (http://www.vbaexpress.com/kb/getarticle.php?kb_id=68) for more information

p45cal
02-02-2010, 10:21 AM
When you choose to debug, hover over 'mycount' in the code. What value does it have? I bet it says 0! Have you made sure that the appropriate sheet is active before you run the code?

p45cal
02-02-2010, 10:27 AM
There is a RegExp solution here (http://www.mgbrown.com/PermaLink66.aspx). Check our Dave Brett's KB article (http://www.vbaexpress.com/kb/getarticle.php?kb_id=68) for more information
Also see here (http://www.regxlib.com/REDetails.aspx?regexp_id=260) , but remember that currently you're splitting the contents of each cell with the space delimiter, so any post codes with a space in will be split too! So either the regex will have to cope for postcodes without spaces or we'll have to split the cell contents another way (not something I'd relish doing).

Also see here (http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Format) for the format of UK postcodes - it's more varied than you think, including single letters before the first number...

mdmackillop
02-02-2010, 10:33 AM
If the data was unsorted, you could check each pair of cells in turn ie A1-A2, A2-A3, and so on. String length checks could limit the RegExp checks to valid lengths only.