PDA

View Full Version : Find Text and Extract



barim
09-19-2018, 12:06 PM
I am trying to compare two columns that contain text separated by spaces. If the word “movies” that is in cell B2 contains anywhere in cell A2 it should be extracted to column C. Number of words in each cell is variable. This is strictly comparison per row, cell by cell, word by word. I am attaching sample workbook and in the highlighted area you can see what I need to achieve. I am open to VBA or formulas.

Thanks.

Paul_Hossler
09-19-2018, 12:34 PM
Easy, but maybe brute force

A user defined function would seem to be the most flexible, although you could read in both columns, process, and generate the third column is performance is a real issue

Had to do some data cleansing so that things like "cat,dog" would be comparable to "dog bird" and return "dog"




Option Explicit


Const Punc As String = ".,?!@#$%^&*()~"

Function Matches(s1 As String, s2 As String) As String
Dim t1 As String, t2 As String, r As String
Dim v1 As Variant, v2 As Variant
Dim i As Long

t1 = LCase(s1)
t2 = LCase(s2)

For i = 1 To Len(Punc)
t1 = Replace(t1, Mid(Punc, i, 1), " ")
t2 = Replace(t2, Mid(Punc, i, 1), " ")
Next i
Do While InStr(t1, " ") > 0
t1 = Replace(t1, " ", " ")
Loop
Do While InStr(t2, " ") > 0
t2 = Replace(t2, " ", " ")
Loop

t2 = t2 & " "
v1 = Split(t1, " ")

For i = LBound(v1) To UBound(v1)
If InStr(t2, v1(i) & " ") > 0 Then
r = r & v1(i) & " "
End If
Next i

Matches = Trim(r)
End Function

barim
09-20-2018, 08:20 AM
Paul, this is great. I really appreciate your help.

I have some exceptions to the rules, and I am not sure if this could be embedded inside the code or we need separate function/macro.
For example, if there are abbreviations like: Intl, corp etc.
If there is "Intl" in cell A and "International" in cell B or vise versa. It should be counted as a match.
I hope this is not too complicated.
Thanks again. :hi:

Paul_Hossler
09-20-2018, 02:43 PM
I'd have an 'Exceptions' 2 col range with the From-To pairs




Option Explicit


Const Punc As String = ".,?!@#$%^&*()~"

'Exceptions = replace col 1 with col 2
Function Matches(s1 As String, s2 As String, Optional Exceptions As Range = Nothing) As Variant
Dim v1 As Variant, v2 As Variant
Dim t2 As String, r As String
Dim i As Long

'default error return
Matches = CVErr(xlErrNA)
On Error GoTo NiceExit

If Not Exceptions Is Nothing Then
If Exceptions.Columns.Count <> 2 Then Exit Function
If Application.WorksheetFunction.CountA(Exceptions) = 0 Then Exit Function
End If

v1 = pvtProcessString(s1, Punc, Exceptions)
v2 = pvtProcessString(s2, Punc, Exceptions)


'make t2 into a string with no double spaces, no puncuation, and words replaced
t2 = Join(v2) & " "
'build a new string with any v1(x) enteries that are in t2
For i = LBound(v1) To UBound(v1)
If InStr(t2, v1(i) & " ") > 0 Then
r = r & v1(i) & " "
End If
Next i

'trim it and get out
Matches = Trim(r)
NiceExit:
End Function

Private Function pvtProcessString(s As String, p As String, Optional x As Range = Nothing) As Variant
Dim t As String
Dim i As Long, j As Long
Dim v As Variant, x1 As Variant

'make lower case
t = LCase(s)

'remove common puncuation
For i = 1 To Len(p)
t = Replace(t, Mid(p, i, 1), " ")
Next i
'remove double spaces
Do While InStr(t, " ") > 0
t = Replace(t, " ", " ")
Loop

'break into word arrays
v = Split(t, " ")

'if there are exceptions, replace the Exception(1) word with the Exception(2) word
If Not x Is Nothing Then
x1 = x.Value

For i = LBound(v) To UBound(v)
For j = LBound(x1, 1) To UBound(x1, 1)
If v(i) = x(j, 1) Then v(i) = x(j, 2)
Next j
Next i
End If


pvtProcessString = v
End Function