PDA

View Full Version : [SOLVED:] Sentiment analysis



Programmer_n
08-02-2016, 03:59 AM
This is an extrapolation work based on an internet tutorial on how to use VBA to array,range and find.

Attempt is to create a sentiment analysis in Word using VBA.

Nothing fancy, plain simple.It is a simple code to range,array,find, compute and message positive, negative and neutral score in Message box.

There are two arrays containing good words and bad words, the code polls a number for occurrence of word in the list and score is calculated by Number of times positive word occurs -Number of times negative word occurs. In case both value turns out to be equal it cancels out each other it gives a score of 0.


Sub Sentiment_analysis()
'A basic Word macro coded by Programmer_n
Dim i1, i2 As Long
Dim Sentposval, Sentnegvalue, score As Long
Dim oRng As Range
Set oRng = ActiveDocument.Range

Dim T1
T1 = Array(" good ", " very good ", " Best ")
For i1 = 0 To UBound(T1)
With oRng.Find
.Text = T1(i1): .Format = True: .MatchCase = False: .MatchWholeWord = True: .MatchWildcards = False: .MatchSoundsLike = False: .MatchAllWordForms = False
Do While .Execute(Forward:=True) = True
Sentposval = 1 + Sentposval
Loop
End With
Next

Dim T2
T2 = Array(" bad ", " very bad ", " worst ")
For i2 = 0 To UBound(T2)
With oRng.Find
.Text = T2(i2): .Format = True: .MatchCase = False: .MatchWholeWord = True: .MatchWildcards = False: .MatchSoundsLike = False: .MatchAllWordForms = False
Do While .Execute(Forward:=True) = True
Sentnegvalue = 1 + Sentnegvalue
Loop
End With
Next

score = Sentposval - Sentnegvalue
MsgBox ("The Sentiment score is " & score)

End Sub


Now the issues.

1. The words in array should hit an exact match, words like good? bad? good. Verygood. won't be counted or else I need to hard code it, inside the array taking into consideration the punctuation marks.
2. I need the range of sentiment to be within -1 and 1. -1 being extreme negative and +1 being extreme positive and in-between ranges, say .5 is somewhat positive -.5 is somewhat negative.
In current scenario if the code finds good 4 times and bad 2 times it gives a score of 2. If it finds good 100 times and bad 98 times still 2. Not pleasing right.

Please help.

gmayor
08-02-2016, 04:46 AM
I wouldn't do it quite like that, and I would especially avoid the spaces around the words in the arrays as the search will ignore the punctuation and the capitalisation.

Sub Sentiment_analysis()
Dim oRng As Range
Dim T1 As Variant
Dim i As Long
Dim sentVal As Long
Dim sentScore As Long: sentScore = 0

T1 = Array("good", "very good", "best")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
sentScore = sentScore + 1
Loop
End With
Next i

T1 = Array("bad", "very bad", "worst")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
sentScore = sentScore - 1
Loop
End With
Next i
MsgBox ("The Sentiment score is " & sentScore)
End Sub

What is more difficult to understand is how you propose setting a score value between -1 & +1 when you have no idea what the totals will be.

Programmer_n
08-02-2016, 05:19 AM
I wouldn't do it quite like that, and I would especially avoid the spaces around the words in the arrays as the search will ignore the punctuation and the capitalisation.

Sub Sentiment_analysis()
Dim oRng As Range
Dim T1 As Variant
Dim i As Long
Dim sentVal As Long
Dim sentScore As Long: sentScore = 0

T1 = Array("good", "very good", "best")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
sentScore = sentScore + 1
Loop
End With
Next i

T1 = Array("bad", "very bad", "worst")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
sentScore = sentScore - 1
Loop
End With
Next i
MsgBox ("The Sentiment score is " & sentScore)
End Sub

What is more difficult to understand is how you propose setting a score value between -1 & +1 when you have no idea what the totals will be.

You are right with your understanding. Currently I am focusing on paragraph level, instead of ActiveDocument, selection.paragraph, that may restrict the problem in hand.

Regarding the -1 to 1, I am working on an approach of range bound, cumulative paragraph average. Will turn up in a day or two.

Thanks for the help.

gmaxey
08-02-2016, 08:22 AM
If you look for "good" and "very good" in the manner used thus far you are going to get inaccurate results.

Consider this

He is good but she is very good!

When you run the code the sentiment returned is 3 (not 2). You need to look for the longest strings first and then not count sub-strings found again in those longer strings.


Sub Sentiment_analysis()
Dim oCol As New Collection
Dim lngIndex As Long
Dim bAlreadyCounted As Boolean
Dim oRng As Range
Dim T1 As Variant
Dim i As Long
Dim sentVal As Long
Dim sentScore As Long: sentScore = 0

T1 = Array("very good", "good", "best")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
bAlreadyCounted = False
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
For lngIndex = 1 To oCol.Count
If oRng.InRange(oCol.Item(lngIndex)) Then
bAlreadyCounted = True
Exit For
End If
Next
If Not bAlreadyCounted Then
sentScore = sentScore + 1
oCol.Add oRng
End If
Loop
End With
Next i

T1 = Array("very bad", "bad", "worst")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
For lngIndex = 1 To oCol.Count
If oRng.InRange(oCol.Item(lngIndex)) Then
bAlreadyCounted = True
Exit For
End If
Next
If Not bAlreadyCounted Then
sentScore = sentScore - 1
oCol.Add oRng
End If
Loop
End With
Next i
MsgBox ("The Sentiment score is " & sentScore)
End Sub

Programmer_n
08-02-2016, 05:25 PM
If you look for "good" and "very good" in the manner used thus far you are going to get inaccurate results.

Consider this

He is good but she is very good!

When you run the code the sentiment returned is 3 (not 2). You need to look for the longest strings first and then not count sub-strings found again in those longer strings.


Sub Sentiment_analysis()
Dim oCol As New Collection
Dim lngIndex As Long
Dim bAlreadyCounted As Boolean
Dim oRng As Range
Dim T1 As Variant
Dim i As Long
Dim sentVal As Long
Dim sentScore As Long: sentScore = 0

T1 = Array("very good", "good", "best")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
bAlreadyCounted = False
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
For lngIndex = 1 To oCol.Count
If oRng.InRange(oCol.Item(lngIndex)) Then
bAlreadyCounted = True
Exit For
End If
Next
If Not bAlreadyCounted Then
sentScore = sentScore + 1
oCol.Add oRng
End If
Loop
End With
Next i

T1 = Array("very bad", "bad", "worst")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
For lngIndex = 1 To oCol.Count
If oRng.InRange(oCol.Item(lngIndex)) Then
bAlreadyCounted = True
Exit For
End If
Next
If Not bAlreadyCounted Then
sentScore = sentScore - 1
oCol.Add oRng
End If
Loop
End With
Next i
MsgBox ("The Sentiment score is " & sentScore)
End Sub


Valid point. Thanks.

SamT
08-02-2016, 06:28 PM
Convert integer to fuzzy quantity with
Fuzzy = Result/(10^Len(Cstr(Result))

Programmer_n
08-02-2016, 07:02 PM
Convert integer to fuzzy quantity with
Fuzzy = Result/(10^Len(Cstr(Result))

Agree to the point. That is the way to go. The code is turning out to be a nice attempt, with help of the forum. :clap:

Programmer_n
08-05-2016, 05:25 PM
Agree to the point. That is the way to go. The code is turning out to be a nice attempt, with help of the forum. :clap:

I have seen an approach on Internet, wherein a person has implemented Sentiment analysis using VBA in excel.

The code groups positive and negative sentiment under Sent_Collection and declares it as New collection. A function is used to check the presence of word in the item list if not returns 0 as Sent_Value and another to calculate the score .

For example: If the function checks a sentence and finds the word "good" which is in the item list, so it returns Sent_Value = 1

.Add item:=1,Key:="good"

Say if the word "good" occurs 3 times. Sent_Value is found to be= 1/3 rounded to two decimal.

A previous loop statement calculates sentiment score as
Sent_Score = Sent_Score + Sent_Collection.item(Sent_Value)

As for negative words .Add item:=-1 is used. Because of this approach value stays within range of -1 to 1. In this approach, negative words and positive words have their own unique value domain,so their occurrence don't cancel out each other.

Paul_Hossler
08-06-2016, 06:59 AM
Convert integer to fuzzy quantity with
Fuzzy = Result/(10^Len(Cstr(Result))

A great many of my quantities are fuzzy, but this is the first I've seen a formula

Please expand / explain what they are and why you'd use them

Paul_Hossler
08-14-2016, 08:46 AM
Little different processing, but I weighted the VG and B and the VB and W with 2x and 3x since I didn't think it was right to have 10 'Good' and 10 'Worst' net out to 0

I also took a shot at a total composite 'score' number




Option Explicit
Sub Sentiment_analysis_1()
Dim oRng As Range
Dim asPos As Variant, aiPos(0 To 2) As Long
Dim asNeg As Variant, aiNeg(0 To 2) As Long
Dim i As Long
Dim dblScore As Double, dblPos As Double, dblNeg As Double

asPos = Array("good", "very good", "best")
asNeg = Array("bad", "very bad", "worst")

'count positives
For i = LBound(asPos) To UBound(asPos)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=asPos(i), _
MatchWholeWord:=True, _
MatchCase:=False)
aiPos(i) = aiPos(i) + 1
Loop
End With
Next I

'count negatives
For i = LBound(asNeg) To UBound(asNeg)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=asNeg(i), _
MatchWholeWord:=True, _
MatchCase:=False)
aiNeg(i) = aiNeg(i) + 1
Loop
End With
Next i

'adjust for 'good' and 'bad' being double counted
aiPos(0) = aiPos(0) - aiPos(1)
aiNeg(0) = aiNeg(0) - aiNeg(1)


'add weighting and total up
For i = LBound(asPos) To UBound(asPos)
aiPos(i) = (i + 1) * aiPos(i)
dblPos = dblPos + aiPos(i)
Next i
For i = LBound(asNeg) To UBound(asNeg)
aiNeg(i) = (i + 1) * aiNeg(i)
dblNeg = dblNeg + aiNeg(i)
Next i


dblScore = (dblPos - dblNeg) / (dblPos + dblNeg)

MsgBox "G / VG / B = " & aiPos(0) & " / " & aiPos(1) & " / " & aiPos(2) & " Total = " & (aiPos(0) + aiPos(1) + aiPos(2))
MsgBox "B / VB / W = " & aiNeg(0) & " / " & aiNeg(1) & " / " & aiNeg(2) & " Total = " & (aiNeg(0) + aiNeg(1) + aiNeg(2))
MsgBox "The Sentiment score is " & dblScore
End Sub

Paul_Hossler
08-14-2016, 11:04 AM
This is a more modular version




Option Explicit

Enum Severity
Low = 1
Medium = 2
High = 3
End Enum

Sub Sentiment_analysis_2()
Dim aiPos(Low To High) As Long, aiNeg(Low To High) As Long
Dim i As Severity
Dim dblScore As Double, dblPos As Double, dblNeg As Double

aiPos(Low) = WordCount("good")
aiPos(Medium) = WordCount("very good")
aiPos(High) = WordCount("best")

aiNeg(Low) = WordCount("bad")
aiNeg(Medium) = WordCount("very bad")
aiNeg(High) = WordCount("worst")

'adjust for 'good' and 'bad' being double counted
aiPos(Low) = aiPos(Low) - aiPos(Medium)
aiNeg(Low) = aiNeg(Low) - aiNeg(Medium)


'add weighting and total up
For i = Low To High
dblPos = dblPos + CLng(i) * aiPos(i)
Next i
For i = Low To High
dblNeg = dblNeg + CLng(i) * aiNeg(i)
Next i


dblScore = (dblPos - dblNeg) / (dblPos + dblNeg)

MsgBox "G / VG / B = " & aiPos(Low) & " / " & aiPos(Medium) & " / " & aiPos(High) & " Weighted Total = " & dblPos
MsgBox "B / VB / W = " & aiNeg(Low) & " / " & aiNeg(Medium) & " / " & aiNeg(High) & " Weighted Total = " & dblNeg
MsgBox "The Sentiment score is " & dblScore
End Sub

Private Function WordCount(S As String) As Long
Dim N As Long
With ActiveDocument.Range.Find
Do While .Execute(FindText:=S, _
MatchWholeWord:=True, _
MatchCase:=False)
N = N + 1
Loop
End With
WordCount = N
End Function

Programmer_n
08-16-2016, 04:54 AM
A great many of my quantities are fuzzy, but this is the first I've seen a formula

Please expand / explain what they are and why you'd use them

When you meant fuzzy I took it as a mention on decimal digits and continued with the comment.

Thanks for the effort.

Paul_Hossler
08-16-2016, 09:33 AM
When you meant fuzzy I took it as a mention on decimal digits and continued with the comment.

Thanks for the effort.

Actually, I was asking SamT what it / they were


My only contribution was Sub Sentiment_analysis_2() approach

Programmer_n
08-18-2016, 05:31 AM
Valid point. Thanks.

Maxey,In your code, rather than giving it a score, how to word it as "overall good" and "overall bad"

gmaxey
08-18-2016, 05:46 AM
My Company Commander in boot camp called be Maxey. If you don't mind, here it Greg.

I'm not sure what you mean. I suppose instead of:

MsgBox ("The Sentiment score is " & sentScore)

I would use something like:

Select Case True
Case sentscore => X: MsgBox "Overall good"
Case Else: Msgbox "Overall bad"
End Select

Programmer_n
08-18-2016, 05:46 AM
If you look for "good" and "very good" in the manner used thus far you are going to get inaccurate results.

Consider this

He is good but she is very good!

When you run the code the sentiment returned is 3 (not 2). You need to look for the longest strings first and then not count sub-strings found again in those longer strings.


Sub Sentiment_analysis()
Dim oCol As New Collection
Dim lngIndex As Long
Dim bAlreadyCounted As Boolean
Dim oRng As Range
Dim T1 As Variant
Dim i As Long
Dim sentVal As Long
Dim sentScore As Long: sentScore = 0

T1 = Array("very good", "good", "best")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
bAlreadyCounted = False
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
For lngIndex = 1 To oCol.Count
If oRng.InRange(oCol.Item(lngIndex)) Then
bAlreadyCounted = True
Exit For
End If
Next
If Not bAlreadyCounted Then
sentScore = sentScore + 1
oCol.Add oRng
End If
Loop
End With
Next i

T1 = Array("very bad", "bad", "worst")
For i = 0 To UBound(T1)
Set oRng = ActiveDocument.Range
With oRng.Find
Do While .Execute(FindText:=T1(i), _
MatchWholeWord:=True, _
MatchCase:=False)
For lngIndex = 1 To oCol.Count
If oRng.InRange(oCol.Item(lngIndex)) Then
bAlreadyCounted = True
Exit For
End If
Next
If Not bAlreadyCounted Then
sentScore = sentScore - 1
oCol.Add oRng
End If
Loop
End With
Next i
MsgBox ("The Sentiment score is " & sentScore)
End Sub


Greg, In this code, Instead of sentScore in messagebox, is it possible to declare a string and classify it as "overall good" and "Overall bad", If the net occurrence happens to be above 0 overall good and net occurrence below zero overall bad.

To give an example, if good occurs 10 times and bad 12 times net score is -2 so the Message box reads "Overall bad".

Programmer_n
08-18-2016, 05:50 AM
My Company Commander in boot camp called be Maxey. If you don't mind, here it Greg.

I'm not sure what you mean. I suppose instead of:

MsgBox ("The Sentiment score is " & sentScore)

I would use something like:

Select Case True
Case sentscore => X: MsgBox "Overall good"
Case Else: Msgbox "Overall bad"
End Select

Greg, that solved the issue. Thanks.

gmaxey
08-18-2016, 05:51 AM
I don't think you need to declare as string

Select Case sentScore
Case Sentscore => 0: MsgBox "Overall good"
Case sentScre = 0: Msgbox "Meh"
Case Else: Msgbox "Overall bad"
End Select

Programmer_n
08-18-2016, 05:59 AM
I don't think you need to declare as string

Select Case sentScore
Case Sentscore => 0: MsgBox "Overall good"
Case sentScre = 0: Msgbox "Meh"
Case Else: Msgbox "Overall bad"
End Select

I have run into a situation; wherein, i need to show two string variables in the msgbox based on the sentscore1 being more than 0 or below 0 that yields string 1 or string 2, sentscore2 being more than 0 or below 0 yields string3 or string 4 respectively. More like code based classification. Is it possible to group strings in Msgbox that are declared in case.

MsgBox ("The Sentiment score is " & string1 & string4 )

Programmer_n
08-18-2016, 06:18 AM
I have run into a situation; wherein, i need to show two string variables in the msgbox based on the sentscore1 being more than 0 or below 0 that yields string 1 or string 2, sentscore2 being more than 0 or below 0 yields string3 or string 4 respectively. More like code based classification. Is it possible to group strings in Msgbox that are declared in case.

MsgBox ("The Sentiment score is " & string1 & string4 )

Trying to solve it with this approach,

Select Case sentScore
Case sentScore >= 0: String1 = "G"
Case sentScore = 0: String1 = "N"
Case Else: String1 = "B"
End Select

Select Case sentScore1
Case sentScore1 >= 0: String2 = "G1"
Case sentScore1 = 0: String2 = "N1"
Case Else: String2 = "B1"
End Select
MsgBox ("The Sentiment score is " & String1 & String2)
End Sub

The message box always gives the Sentiment score is BB1 irrespective of change in value.

gmaxey
08-18-2016, 09:14 AM
Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim sentScore
Dim sentScore1
Dim strMsg As String
sentScore = 1
sentScore1 = 0

Select Case sentScore
Case Is >= 0: strMsg = "G"
Case Is = 0: strMsg = "N"
Case Else: strMsg = "B"
End Select
Select Case sentScore1
Case Is >= 0: strMsg = strMsg & "G1"
Case Is = 0: strMsg = strMsg & "N1"
Case Else: strMsg = strMsg & "B1"
End Select
MsgBox "The Sentiment score is " & strMsg
lbl_Exit:
Exit Sub
End Sub

SamT
08-18-2016, 09:15 AM
Fuzzy mathematics (https://en.wikipedia.org/wiki/Fuzzy_mathematics)
My only exposure was in the 70's, so all I know about Fuzzy Math is that all quantities are between 0 and 1, and that it is widely used in industry. I also know that Quantum Mechanics is pretty 'fuzzy.'

arrWords("good", "bad", ugly")
arrWeights(0.9, 1, 1.1)

Find Word, append Word.Weight to arrPolls
arrPolls As Weights (0.9, 1, 1, 1, 1.1, 0.9, 0.9, 1, 1.1, etc)
arrSorted = arrPolls after sorting

Sentiment = average(arrPolls)
Mean= (arrSorted(RoundUp(Ubound/2)) + arrSorted(RoundDown(Ubound/2)))/2

Programmer_n
08-18-2016, 10:37 PM
Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim sentScore
Dim sentScore1
Dim strMsg As String
sentScore = 1
sentScore1 = 0

Select Case sentScore
Case Is >= 0: strMsg = "G"
Case Is = 0: strMsg = "N"
Case Else: strMsg = "B"
End Select
Select Case sentScore1
Case Is >= 0: strMsg = strMsg & "G1"
Case Is = 0: strMsg = strMsg & "N1"
Case Else: strMsg = strMsg & "B1"
End Select
MsgBox "The Sentiment score is " & strMsg
lbl_Exit:
Exit Sub
End Sub



Thanks