PDA

View Full Version : [SOLVED:] Split words from text string and count occurrence of each word in a range in excel



anish.ms
02-07-2021, 01:03 PM
Hi,


I am trying to get a vba code which would help count occurrence of each word in a range.


For example:
I have a complete sentence (with various words) in each cell from A2 to A833 and I want to check how many times each unique word has shown up on this list. Each unique word should be pasted from C2 down the row and the count of the words starting D2 down the row.


Greatly appreciate if someone can please help with a VBA code to get this task done.




Thanks in advance.

SamT
02-07-2021, 04:04 PM
Are Dates Words?

Lack of standardization for VBA is an issue... "NEWS PAPER" is 2 words. "MAR 18" is two words, "MAR'18" is one word and neither are dates

"MAINT,SECURITY" is one word, "CABIN & UPS ROOM" is four words

Some don't seem to fit in the same categories... "DRIVER FOOD", "FOOD & FUEL - MR.MURALI & ANAS -MEETING LAW FIRM",and "FOOD - STOCK CHECKING". And then there is "JUICE & SNACKS - MOE CAMP. - MARKETING DEPT."


You need to refine and define the issue better.

Strictly speaking, in VBA,
all Words in sentences are separated by spaces. This makes all punctuation part of or entirely a Word.
a year is required for a String to be a Date, Mar 18 and Mar'18 are not Dates.

I suggest you start by standardizing all dates, (Replace "MAR 18" and "MAR'18" with 01 Mar, 2018", then create a list of a few Words, (Strings,) to count to develop and test some code with. As things proceed, other Strings can be added to the list.

Creating code to perform EXACTLY as requested is trivial, but I think the results would be useless. For examples: The Strings "&" and "-" would be counted a gazillion times, where the Strings "DT:31/3/18(MCT-SHR)" and "DR.SIVAKUMAR(SHR" would be counted only once each. Also, about 4000 to 8000 unique Strings will be found in that list.

macropod
02-07-2021, 05:07 PM
For example, sending the output to a message box:

Sub Demo()
Dim r As Long, i As Long, j As Long
Dim StrTxt As String, StrFnd As String, StrTmp As String, StrFrq As String
StrTxt = " "
With ActiveSheet
For r = 2 To 883
StrTxt = StrTxt & .Range("A" & r).Text & " "
Next
StrTxt = Trim(StrTxt)
Do While InStr(StrTxt, " ") > 0
StrTxt = Replace(StrTxt, " ", " ")
Loop
StrTxt = " " & StrTxt & " "
StrTmp = StrTxt
For i = 1 To UBound(Split(StrTxt, " ")) - 1
StrFnd = Split(StrTxt, " ")(i) & " "
If InStr(StrFrq, vbCr & StrFnd) = 0 Then
j = Len(StrTmp)
Do While InStr(StrTmp, " " & StrFnd) > 0
StrTmp = Replace(StrTmp, " " & StrFnd, " ")
Loop
StrFrq = StrFrq & vbCr & StrFnd & (j - Len(StrTmp)) / Len(StrFnd)
End If
Next
MsgBox StrFrq
End With
End Sub
I'll leave you to invest some effort in handling dates correctly and outputting the results to your worksheet.

anish.ms
02-07-2021, 09:19 PM
Thanks for your time Sam!
I understand the limitations in taking out meaningful words. My expectation is to take out all the words in sentences that are separated by spaces and I don't need the exact dates. The purpose of doing this is to get the most used words and go though them. Mainly to look for words such as fines, penalties, fuel, discount and gift, etc. and verify those transactions.
I think it is better to create a list of words/ special characters to be excluded such as for, paid, & and -, etc.
I was looking for the output as coded by Paul Edstein, but I need it to be copied in a separate sheet (words and count in separate columns)

anish.ms
02-07-2021, 09:22 PM
Thanks Paul Edstein!
Let me try whether I will succeed in outputting the results in a separate worksheet.
I also need to incorporate list of words/symbols to be excluded and consider words without case sensitive. for example "PAID and "paid" should be considered as same .

anish.ms
02-08-2021, 02:39 AM
I tried but couldn't succeed

macropod
02-08-2021, 02:59 AM
You could change:

StrTxt = " " & StrTxt & " "
to:

StrTxt = " " & lCase(StrTxt) & " "
or:

StrTxt = " " & uCase(StrTxt) & " "
As for whatever problems you're having outputting the results to your worksheet, show us your code. This is hardly rocket science and I'm not inclined to do such simple work for you. There are plenty of threads here showing how you might do that.

snb
02-08-2021, 03:15 AM
You might consider to use VBA, provided the list of words consists of only 1 column.


Sub M_snb()
sn = Split(LCase(Trim(Join(Application.Transpose(Sheet1.UsedRange.Columns(1))))) )
MsgBox sn(0) & vbLf & sn(UBound(sn)) & vbLf & UBound(sn)
End Sub

NB. Minimize worksheet interaction in VBA.

SamT
02-08-2021, 03:47 AM
This SHOULD do what you need... with a little reiteration and editing by you. My brain is fried and I can't see the little syntax error I made. The darn thing compiles, but won't run properly on my sh***y box.


Option Explicit


Sub Listem()

Dim WordCounts As Object
Set WordCounts = New Scripting.Dictionary

Dim ListSht As Worksheet
Set ListSht = Sheets("Sheet1") 'Edit Sheet1 to suit

Dim SpaceSkips As Variant 'replace item with space (" ")
SpaceSkips = Array("'", ".", ",", "dt:") 'extend as needed
Dim DelSkips As Variant 'Delete Items from listing
DelSkips = Array("MAR'18 ", "MAR 18", Chr(38)) 'extend as needed. Chr(38) = "&"

Dim ItemsArr As Variant

Dim i As Long, j As Long

Dim ListRng As Variant
With ListSht
ListRng = WorksheetFunction.Transpose(Intersect(.UsedRange, .Columns(1)).Value)
End With

'skip the header
For j = 2 To UBound(ListRng)
' special Replacements
ListRng(j) = Replace(ListRng(j), "DR.", "DR_")
ListRng(j) = Replace(ListRng(j), "DR.", "DR_")

'add specials as needed

For i = 0 To UBound(SpaceSkips)
ListRng(j) = Replace(ListRng(j), SpaceSkips(i), " ")
Next i: Next j

For j = 2 To UBound(ListRng)
For i = 0 To UBound(DelSkips)
ListRng(j) = Replace(ListRng(j), DelSkips(i), "")
Next i: Next j

'If the two Skips Arrays have been extended and the Specials are complete, The List is ready to be counted
'Then create the count lists
For j = 2 To UBound(ListRng)
ItemsArr = Split(ListRng(j), " ")
For i = 0 To UBound(ItemsArr)
'Skip Dates
If Not IsDate(Trim(ItemsArr(i))) Then
If WordCounts.Exists(Trim(ItemsArr(i))) Then
WordCounts.Item(Trim(ItemsArr(i))) = WordCounts.Item(Trim(ItemsArr(i))) + 1

Else:
WordCounts.Add ItemsArr(i), 1
End If
End If
Next i
Next j


'That should make WordCounts two arrays of unique words and their counts

With Sheets("Sheet2") 'Edit as needed
.Range("A1").Resize(WordCounts.Count, 1) = WordCounts.Keys
.Range("B1").Resize(WordCounts.Count, 1) = WordCounts.Items
End With

WordCounts.RemoveAll 'Used to clear the dictionary.

End Sub

snb
02-08-2021, 04:49 AM
Sub M_snb()
sn = Split(LCase(Application.Trim(Join(Application.Transpose(Sheet1.UsedRange.Co lumns(1))))))
sp = sn

For j = 0 To UBound(sn)
sp(j) = UBound(Filter(sn, sn(j))) + 1
Next

Sheet1.Cells(1, 6).Resize(UBound(sn) + 1, 2) = Application.Transpose(Array(sn, sp))
End Sub

anish.ms
02-08-2021, 05:02 AM
Thanks Sam!
I'm a newbee in VBA and dictionary is completely a new subject for me.
I was getting error with

Set WordCounts = New Scripting.Dictionary
and I changed it to

Set WordCounts = CreateObject("Scripting.Dictionary")
The code is working after the above change but the result in Sheet 2 is "PAID" and "52" in from row 1 to 866
I'm unable to figure it out :think:. Could you please help

anish.ms
02-08-2021, 05:12 AM
Thanks snb!
The code is working fine but I need to remove the duplicates. for example, the word "paid" is showing 52 times


Sub M_snb() sn = Split(LCase(Application.Trim(Join(Application.Transpose(Sheet1.UsedRange.Co lumns(1))))))
sp = sn

For j = 0 To UBound(sn)
sp(j) = UBound(Filter(sn, sn(j))) + 1
Next

Sheet1.Cells(1, 6).Resize(UBound(sn) + 1, 2) = Application.Transpose(Array(sn, sp)) End Sub

snb
02-08-2021, 05:19 AM
Please analyse the code !
Are you familiar with 'sorting' ?
Change only 1 line of code to get what you want.

anish.ms
02-08-2021, 05:25 AM
Thanks Paul Edstein!
I was trying to use the array method to store the result and list based on the my understating and based on the codes I learned from Paul Hossler and SamT from their support to my previous posts.

ReDim aryword (2 To lastRow)
I applied the uCase based on one of the previous code supported by Paul Hossler for highlighting search values irrespective of the case.
Its a long way for me to go to have a good understanding of VBA codes. I'm in mid of a VBA course, but it is taking some time to complete due to tight work timelines.
Thanks again for your time and response

SamT
02-08-2021, 09:59 AM
Of course I figured out the issue the instant my head hit the pillow, but I'm no Hemingway to get out of bed for that.

This version worked fine. took one sec to finish that list.

Option Explicit


Sub Listem_2()

Dim WordCounts As Object
Set WordCounts = CreateObject("Scripting.Dictionary")
Dim ListSht As Worksheet
Set ListSht = Sheets("Sheet1") 'Edit Sheet1 to suit

Dim SpaceSkips As Variant 'replace item with space (" ")
SpaceSkips = Array("'", ".", ",", "dt:", " - ") 'extend as needed
Dim DelSkips As Variant 'Delete Items from listing
DelSkips = Array("MAR'18 ", "MAR 18", Chr(38)) 'extend as needed. Chr(38) = "&"

Dim ItemsArr As Variant

Dim i As Long, j As Long

Dim ListRng As Variant
With ListSht
ListRng = WorksheetFunction.Transpose(Intersect(.UsedRange, .Columns(1)).Value)
End With

'skip the header
For j = 2 To UBound(ListRng)
' special Replacements
ListRng(j) = Replace(ListRng(j), "DR.", "DR_")
ListRng(j) = Replace(ListRng(j), "DR_ ", "DR_")

'add specials as needed

For i = 0 To UBound(SpaceSkips)
ListRng(j) = Replace(ListRng(j), SpaceSkips(i), " ")
Next i: Next j

For j = 2 To UBound(ListRng)
For i = 0 To UBound(DelSkips)
ListRng(j) = Replace(ListRng(j), DelSkips(i), "")
Next i: Next j

'If the two Skips Arrays have been extended and the Specials are complete, The List is ready to be counted
'Then create the count lists
For j = 2 To UBound(ListRng)
ItemsArr = Split(ListRng(j), " ")
For i = 0 To UBound(ItemsArr)
'Skip Dates and blanks
If Not IsDate(Trim(ItemsArr(i))) and not Len(Trim(ItemsArr(i))) > 0 Then
If WordCounts.Exists(Trim(ItemsArr(i))) Then
WordCounts.Item(Trim(ItemsArr(i))) = WordCounts.Item(Trim(ItemsArr(i))) + 1

Else:
WordCounts.Add Trim(ItemsArr(i)), 1
End If
End If
DoEvents

Next i
Next j


'That should make WordCounts two arrays of unique words and their counts

With Sheets("Sheet2") 'Edit as needed
.Range("A:B").CLearContents
.Range("A1").Resize(WordCounts.Count, 1) = WorksheetFunction.Transpose(WordCounts.Keys)
.Range("B1").Resize(WordCounts.Count, 1) = WorksheetFunction.Transpose(WordCounts.Items)
End With

WordCounts.RemoveAll 'Used to clear the dictionary.

End Sub

anish.ms
02-08-2021, 09:59 AM
I modified as below

Sub M_snb()

Dim sn() As String, sp() As String
Dim j As Long, r As Long
sn = Split(LCase(Application.Trim(Join(Application.Transpose(Sheet1.UsedRange.Co lumns(1))))))
sp = sn

For j = 0 To UBound(sn)
sp(j) = UBound(Filter(sn, sn(j))) + 1
Next

Sheet1.Cells(1, 6).Resize(UBound(sn) + 1, 2) = Application.Transpose(Array(sn, sp))
r = Sheet1.UsedRange.Rows.Count
With ActiveSheet.Range("F1:G" & r)
.Sort Key1:=Range("G1"), Order1:=xlDescending, Header:=xlNo
.RemoveDuplicates Columns:=1, Header:=xlNo
End With
End Sub

anish.ms
02-08-2021, 10:44 AM
Thanks a Ton Sam!
It is awesome to have the option of exclusions :thumb

SamT
02-08-2021, 10:45 AM
I just modified my last posted code a bit to clean up the results a bit. For example, all "DR" are now closely followed by "_Dr's Name" and extra spaces should not be counted

snb
02-08-2021, 10:46 AM
If you only want to filter words that occur only once, you will have to remove the others.
Do not declare any variable unnecessarily


Sub M_snb()
sn = Split(LCase(Application.Trim(Join(Application.Transpose(Sheet1.UsedRange.Co lumns(1))))))

For j = 0 To UBound(sn)
if UBound(Filter(sn, sn(j)))>0 then sn(j) = "~"
Next
sn=filter(sn,"~",0)

Sheet1.Cells(1, 6).Resize(UBound(sn) + 1) = Application.Transpose(sn)
End Sub

anish.ms
02-08-2021, 11:45 AM
thank you

anish.ms
02-08-2021, 11:48 AM
Thanks a lot!
I made a small correction in the code
FROM

If Not IsDate(Trim(ItemsArr(i))) and not Len(Trim(ItemsArr(i))) > 0 Then
TO

If Not IsDate(Trim(ItemsArr(i))) And Len(Trim(ItemsArr(i))) > 0 Then

Instead of defining exclusions within the code, can I refer it to a range as given below-
DelSkips = Array(ListSht.Range("C1:C20"))