PDA

View Full Version : How to search substring against range and if found remove it?



joshman1088
09-06-2016, 08:00 AM
I have column E which contains what I call substrings. These substrings are separated by semicolons ";" which is what identifies them as unique entities within cells. In almost every instance in my workbook, every single one of these substrings will be unique from every other substring in every other cell in the workbook. However, sometimes there will be duplicates and they need to be removed. I have marked which cells contain duplicates that need to be removed by marking the associated O cell as true. For instance, in the attached sheet E2: (spaces " " inserted after the "//" and in the "http" to get around restrictions)

ht tp:// ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg;ht tp:// ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image1.jpg;ht tp:// 72-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image2.jpg

Contains the same substring as found in D13

+ht tp:// ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg


In the above pair, O2 is marked true which means the duplicates in E2 need to be removed and D13 left unchanged.

Also, E17 contains duplicates of E8. In that case, O17 is "true" which indicates e17 duplicates be removed and E8 left as it is.


So the logic I desire is when you see "TRUE" in O, search the E value in that row against every cell in columns D and E and if duplicate substrings are found in E and D, remove only duplicate substrings from the E cell which has an associated O value of true, else do nothing. So we will only ever be removing duplicates from column E even though we are also searching against column D and E.


Additional information:
1 The amount of substrings in each cell in E:E are not guaranteed to be the same. Cells in E:E may sometimes be blank but will never exceed 20 substrings. but D will always have 1 per cell.
2)Substrings will ALWAYS start with what you see up UNTIL the word "bob".
3) The number of rows may be quite large, maybe up to 50,000 but I'm not sure exactly.


Is there a macro we can write to accomplish this? I haven't had much results on other forums or googling it myself. Thanks!



Workbook: (please note spaces)

ht tps:// drive.google.com/file/d/0B1TMLUGTVwb9aXIxcVdibkRiRk0/view?usp=sharing

joshman1088
09-06-2016, 11:20 AM
If it helps your thinking process, I received this on mrexcel, but it didn't work, it seems to remove anything in the E cell where O is marked true. Problem is the guy never responded so I'm looking around for answers. Also, the below code only searches against range E:E whereas I now need D:E


Option Explicit


Sub SearchInRange()


Dim Data As Variant
Dim Dict As Object
Dim Key As Variant
Dim Item As Variant
Dim n As Long
Dim NewData As String
Dim Rng As Range
Dim RngBeg As Range
Dim RngEnd As Range
Dim Term As Variant
Dim Terms As Variant
Dim Wks As Worksheet

Set Wks = ThisWorkbook.ActiveSheet

Set RngBeg = Wks.Range("E2")
Set RngEnd = Wks.Cells(Rows.Count, "E").End(xlUp)

If RngEnd.Row < RngBeg.Row Then Exit Sub

Set Dict = CreateObject("Scripting.Dictionary")
Dict.CompareMode = vbTextCompare

' Copy column "E" values into a 2D array.
Data = RngBeg.Resize(RowSize:=RngEnd.Row - RngBeg.Row + 1).Value

' Save unique values in the Dictionary object.
For Each Item In Data
If Not IsEmpty(Item) Then
Terms = Split(Item, ":")
For Each Term In Terms
If Not Dict.Exists(Term) Then Dict.Add Term, True
Next Term
End If
Next Item

Set RngBeg = Wks.Range("O2")
Set RngEnd = Wks.Cells(Rows.Count, "O").End(xlUp)
Set Rng = Wks.Range(RngBeg, RngEnd)

' Remove duplicate values in Data array when cell in column "O" is TRUE.
For n = 1 To Rng.Rows.Count
If Rng.Cells(n, 1) = True Then
NewData = ""
Terms = Split(Data(n, 1), ":")
For Each Term In Terms
If Not Dict.Exists(Term) Then
If NewData = "" Then
NewData = Term
Else
NewData = NewData & ":" & Term
End If
End If
Next Term
Data(n, 1) = NewData
End If
Next n

' Write the Data array values back to the column "E".
Set Rng = Wks.Range("E2").Resize(UBound(Data, 1), 1)
Rng.Value = Data

End Sub

SamT
09-06-2016, 12:35 PM
Substrings will ALWAYS start with what you see
WHAT??? Of course what we see always starts with what we see. In fact, it always ends with what we see, and everything in between is always what we see. Bob can't tell us what to not see..

Unless we be doing LSD. :D


UNTIL the word "bob".
What does Bob have to do with things?

Ignoring what Bob has to say for the moment, What I think you are saying is:
For each O=True Row
For each SubString In Cell "E"
If SubString is Found anywhere else in Column D or E, Delete that substring from Cell "E".

OK, what does Bob say about that?

Final question: If E has only one Substring and it is a duplicate, What happens?

joshman1088
09-06-2016, 12:45 PM
WHAT??? Of course what we see always starts with what we see. In fact, it always ends with what we see, and everything in between is always what we see. Bob can't tell us what to not see..


Final question: If E has only one Substring and it is a duplicate, What happens?

I mean substrings in E will always start with the file path ht tp:// ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/
And strings in D will always be the same as above but with a + in front of them: +ht tp:// ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/
I mentioned that because people often ask if cell values will always look the same or not.

Yes, if E only has one value and it is a duplicate then please delete it.

Thanks! Sorry for the confusion. It's hard to work things correctly every time.

SamT
09-06-2016, 02:42 PM
Option Explicit

Sub DeleteDuplicatesIn_E()


Dim Found As Range
Dim CriteriaRng As Range
Dim CriteriaCel As Range
Dim FirstCriteriaAddress As String
Dim StrCel As Range
Dim Term As Variant
Dim Terms As Variant
Dim Wks As Worksheet

Set Wks = ThisWorkbook.ActiveSheet
With Wks

Set CriteriaRng = Intersect(Range("O:O"), .UsedRange)
Set CriteriaCel = CriteriaRng.Cells(1)
If CriteriaCel <> "TRUE" Then _
Set CriteriaCel = CriteriaRng.Find(What:="TRUE", After:=CriteriaCel, SearchDirection:=xlNext)
FirstCriteriaAddress = CriteriaCel.Address

Do
Set StrCel = .Cells(CriteriaCel.Row, "E")
If StrCel = "" Then GoTo NextCriteriaCel

Terms = Split(StrCel, ":")

If Not IsArray(Terms) Then
Set Found = Range("D:E").Find(What:=Terms, After:=StrCel, SearchDirection:=xlNext)
If Not Found.Address = StrCel.Address Then StrCel = ""
Else
For Each Term In Terms
Set Found = Range("D:E").Find(What:=Term, After:=StrCel, SearchDirection:=xlNext)
If Not Found.Address = StrCel.Address Then Replace StrCel, Term, ""
Next Term
'Code here to remove excess Colons from StrCel
End If

NextCriteriaCel:
Set CriteriaCel = CriteriaRng.Find(What:="", After:=CriteriaCel, _
SearchDirection:=xlNext)
Loop While CriteriaCel.Address <> FirstCriteriaAddress
End With
End Sub

joshman1088
09-06-2016, 08:12 PM
How long did that take you to run? I hit run and it never stops, I left it hours ago and it's still going. I tried on two separate computers as well. Why is this the case? Regardless, thank you, I'm dying to see it if works.

SamT
09-06-2016, 11:12 PM
You didn't test it on a small set of data first? :crying: :funnyashe

I don't even know if it works. All I did was compile it to check for obvious errors.

Screenupdating is not disabled, so: 1) you should be able to see the progression; 2) it is much slower than with Screenupdating diabled.

Press and hold Ctrl+F12, (Pause Break,) for at least ten seconds to stop execution.

joshman1088
09-07-2016, 06:45 AM
You didn't test it on a small set of data first? :crying: :funnyashe

I don't even know if it works. All I did was compile it to check for obvious errors.

Screenupdating is not disabled, so: 1) you should be able to see the progression; 2) it is much slower than with Screenupdating diabled.

Press and hold Ctrl+F12, (Pause Break,) for at least ten seconds to stop execution.

I did check it on a small dataset. I used the test file I uploaded, something like 19 rows? Do you have any insight into this? This behavior doesn't seem normal. I can run other formulas that search every cell against every other range and it happens pretty quickly. I'm also trying it on 7 rows right now and it just seems to lock up.

Paul_Hossler
09-07-2016, 07:19 AM
Is this correct:

Situation1 --

E2 contains these 3 'substrings'

h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg;
h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image1.jpg;
h ttp://72-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image2.jpg


D13 contains 1

+h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg


which matches (after deleting the +) the first substring in E2

So O2 = True, and you want to delete the first substring from E2 since it's in D13


Situation2 --

Likewise, one of the substrings in E17 is also in E2

so O17 = True and you want to delete the duplicated substring in E17

joshman1088
09-07-2016, 07:28 AM
Is this correct:

Situation1 --

E2 contains these 3 'substrings'

h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg;
h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image1.jpg;
h ttp://72-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image2.jpg


D13 contains 1

+h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg


which matches (after deleting the +) the first substring in E2

So O2 = True, and you want to delete the first substring from E2 since it's in D13


Situation2 --

Likewise, one of the substrings in E17 is also in E2

so O17 = True and you want to delete the duplicated substring in E17

Yes that is entirely correct.
But in addition to this, I have just realized the problem is bigger than I realized at first. I need the search to include columns B, C, D and E, which will be searched against B:B, C:C, D: D and E:E and altered only if the O value is true, as before. I regret that this will increase computation time.

So if B2,C2 and D2 contain
+h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg

and E283 contains
h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318CK1_main.jpg;
h ttp://ec2-54-172-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image1.jpg;
h ttp://72-158-94.compute-1.amazonaws.com/assets/images/bob/VG2318C_image2.jpg

And O2 is true, then remove the entire cell contents of B2,C2 and D2. Or just remove the string, whichever is easier. There will only ever be a single string in b,c,d so if it's a duplicate and removed the cell becomes empty. Thanks for all your help, this is a pivotal issue.

Paul_Hossler
09-07-2016, 08:42 AM
Not sure I understood all that

What I have is

1. A Sub that removes duplicates in a single column (B, C, D)
2. A Sub that takes the de-duped single column (B, C, D) and removes substrings in a multiple substring column (E)
3. Logic that takes each piece of a multi-substring column (E) and removes that substring from the rest of that column (E)

Do you really need 'True' in column O? That will get into more complicated and longer running code



Option Explicit

Sub DeDup()
Dim rData As Range, rCell As Range, rNext As Range, rLast As Range
Dim s As String
Dim v As Variant
Dim i As Long

Application.ScreenUpdating = False

Set rData = Worksheets("Test_ph").Cells(1, 1).CurrentRegion

Call CheckForDupsInSingleValueColumn(rData.Columns(2))
Call CheckForDupsInSingleValueColumn(rData.Columns(3))
Call CheckForDupsInSingleValueColumn(rData.Columns(4))

Call CheckForDupsSingleAgaintMultiple(rData.Columns(2), rData.Columns(5))
Call CheckForDupsSingleAgaintMultiple(rData.Columns(3), rData.Columns(5))
Call CheckForDupsSingleAgaintMultiple(rData.Columns(4), rData.Columns(5))

'check for dups in col E and remove
With rData.Columns(5)
For Each rCell In .Cells
Set rNext = .Cells(rCell.Row + 1, 1)
Set rLast = rNext.End(xlDown)

v = Split(rCell.Value, ";")

For i = LBound(v) To UBound(v)
Call Range(rNext, rLast).Replace(v(i) & ";", vbNullString, xlPart)
Call Range(rNext, rLast).Replace(v(i), vbNullString, xlPart) ' no ; on last one
Next I
Next
End With

Application.ScreenUpdating = True

End Sub

Private Sub CheckForDupsInSingleValueColumn(rSingleCol As Range)
Dim rCell As Range, rNext As Range, rLast As Range

With rSingleCol
For Each rCell In .Cells
If Application.WorksheetFunction.CountIf(.Cells, rCell.Value) > 1 Then
Set rNext = .Cells(rCell.Row + 1, 1)
Set rLast = rNext.End(xlDown)
Call Range(rNext, rLast).Replace(rCell.Value, vbNullString, xlWhole)
End If
Next
End With
End Sub

Private Sub CheckForDupsSingleAgaintMultiple(rSingleCol As Range, rMultiple As Range)
Dim rCell As Range
Dim s As String
With rSingleCol
For Each rCell In .Cells
If Left(rCell.Value, 1) = "+" Then
s = Right(rCell.Value, Len(rCell.Value) - 1)
Else
s = rCell.Value
End If

If Len(s) > 1 Then
Call rMultiple.Replace(s & ";", vbNullString, xlPart)
Call rMultiple.Replace(s, vbNullString, xlPart) ' no ; on last one
End If
Next
End With
End Sub

joshman1088
09-07-2016, 09:26 AM
Not sure I understood all that

What I have is

1. A Sub that removes duplicates in a single column (B, C, D)
2. A Sub that takes the de-duped single column (B, C, D) and removes substrings in a multiple substring column (E)
3. Logic that takes each piece of a multi-substring column (E) and removes that substring from the rest of that column (E)

Do you really need 'True' in column O? That will get into more complicated and longer running code



Paul, I too have questions for you. First, let me say that the reason we need "true" is we need to tell excel WHICH cells to target for duplicate deletion. Right now it appears to target the lowest of the pair? So if B2,C2,D2 and B3,C3,D3 have duplicates it removes the duplicates in row 3. What I need is for it to remove row 2 duplicates if O2 is "true", or row 3 duplicates if O3 is "true. That would be most helpful.

To address what you have:

1. This seems to work other than: A) we need to remove only duplicates where O is true (or some other method if you think there is a better way). B) I need this to search against B,C,D and E. Right now it seems to only search against B,C,D. Or atleast it didn't remove duplicates from B,C or D when duplicates with in an E cell only.

You may view the below worksheet for an example of where I want B2,C2 and D2 removed because they contain duplicate of string in E3

https://drive.google.com/file/d/0B1TMLUGTVwb9c1VKWE9SWEpOdjA/view?usp=sharing

2. When I try running it opens dialog asking for macro name, with the first option being "dedup". I don't see the other names "CheckForDupsInSingleValueColumn" etc, so I just hit "run" and nothing happens. Why is this?
3. Same as 2.

I appreciate your help immensely. It seems to be close? to working so that's very good. If we can get this working it will be of tremendous value.

SamT
09-07-2016, 09:51 AM
used the test file I uploaded, something like 19 rows?
I didn't count that as a test. 1900 rows, better.

How many rows did you use it on?
Any idea how far the code got before you terminated it?

I did see a logic error in the last Set CriteriaCel = CriteriaRng.Find(What:="",. . . That would have added a tremendous overhead.

And I added SpeedCode, and SearchOrder:= xlByColumns in two locations.




Sub DeleteDuplicatesIn_E()


Dim Found As Range
Dim CriteriaRng As Range
Dim CriteriaCel As Range
Dim FirstCriteriaAddress As String
Dim StrCel As Range
Dim Term As Variant
Dim Terms As Variant
Dim Wks As Worksheet

SpeedCodeTrue

Set Wks = ThisWorkbook.ActiveSheet
With Wks

Set CriteriaRng = Intersect(Range("O:O"), .UsedRange)
Set CriteriaCel = CriteriaRng.Cells(1)
If CriteriaCel <> "TRUE" Then _
Set CriteriaCel = CriteriaRng.Find(What:="TRUE", After:=CriteriaCel, SearchDirection:=xlNext)
FirstCriteriaAddress = CriteriaCel.Address

Do
'****************For test puroposes only:
CriteriaCel.Interior.ColorIndex = 4
'****************
Set StrCel = .Cells(CriteriaCel.Row, "E")
If StrCel = "" Then GoTo NextCriteriaCel

Terms = Split(StrCel, ":")

If Not IsArray(Terms) Then
Set Found = Range("D:E").Find(What:=Terms, After:=StrCel, SearchDirection:=xlNext, SearchOrder:= xlByColumns)
If Not Found.Address = StrCel.Address Then
StrCel = ""

'****************For test puroposes only:
CriteriaCel.Interior.ColorIndex = 3
'****************

End If
Else
For Each Term In Terms
Set Found = Range("D:E").Find(What:=Term, After:=StrCel, SearchDirection:=xlNext, SearchOrder:= xlByColumns)
If Not Found.Address = StrCel.Address Then
Replace StrCel, Term, ""

'****************For test puroposes only:
CriteriaCel.Interior.ColorIndex = 3
'****************

End If
Next Term
'Code here to remove excess Colons from StrCel
End If

NextCriteriaCel:
Set CriteriaCel = CriteriaRng.Find(What:="TRUE", After:=CriteriaCel, _
SearchDirection:=xlNext)
Loop While CriteriaCel.Address <> FirstCriteriaAddress
End With
End Sub


Public Sub SpeedCode(Optional Faster As Boolean)
Static CalcSetting As XlCalculation
If CalcSetting = 0 Then CalcSetting = Application.Calculation

If Faster Then
With Application
CalcSetting = .Calculation
.ScreenUpdating = False
.EnableEvents = False
.Calculation = xlCalculationAutomatic
End With
Else
With Application
.ScreenUpdating = True
.EnableEvents = True
.Calculation = CalcSetting
End With
CalcSetting = 0
End If

SpeedCode False
End Sub
The For testing purposes only code: Each checked TRUE cell in Column O is colored, IF duplicates are found, it is colored Red.

Make these adjustments as desired

For testing only the first thousand rows:
Change; Set CriteriaRng = Intersect(Range("O:O"), .UsedRange)
To
Set CriteriaRng = Range("O1:O1000")

For testing only the first ten thousand rows:
To
Set CriteriaRng = Range("O1:O10000")
Note that those adjustments still search the entire sheet but only for the TRUEs found in the limited set of Column O rows.

Personally, I would comment out the first SpeedCode line and set the Criteria Range to only the first two TRUEs, and I would Step thru the code using F8 to manually verify the deleting process working.

Then I would test the first 100 rows. If the 100 row test takes longer than 1 second, let us know before you continue.




To search B,C,D, and E:
2 Instances Set Found = Range("D:E") to Set Found = Range("B:E")

joshman1088
09-07-2016, 10:12 AM
Sam, why would a 19 row test not be good? Yes, I originally tested on 19 rows and I have no idea how far it got, it just says "running".

I ran your new code (first code box) by commenting out the speedcode true and it did highlight the "true"s as red. However, I checked and it does so regardless of whether there are any duplicates or not. It did not, however, delete anything. Was it supposed to? I stepped through with f8 and same results.

I checked 100 rows and it only took a couple milliseconds.

SamT
09-07-2016, 11:15 AM
Change
Replace StrCel, Term, ""
To
StrCel = Replace(StrCel, Term, "")

And in SpeedCode, remove the last line "SpeedCode = False" :dunno how that got in there

joshman1088
09-07-2016, 11:23 AM
Sam, I did that and it just turns the TRUE green.

Paul_Hossler
09-07-2016, 11:33 AM
Paul, I too have questions for you. First, let me say that the reason we need "true" is we need to tell excel WHICH cells to target for duplicate deletion. Right now it appears to target the lowest of the pair? So if B2,C2,D2 and B3,C3,D3 have duplicates it removes the duplicates in row 3. What I need is for it to remove row 2 duplicates if O2 is "true", or row 3 duplicates if O3 is "true. That would be most helpful.


It does delete (replace with "" actually) the 2nd, 3rd, ... instances

1. How does the TRUE get into O2 or O3? Good luck if you're doing it manually

2. Does it matter?

3. Do you want to delete the entire row, or just the substring that is duplicated?

joshman1088
09-07-2016, 11:43 AM
It does delete (replace with "" actually) the 2nd, 3rd, ... instances

1. How does the TRUE get into O2 or O3? Good luck if you're doing it manually

2. Does it matter?

3. Do you want to delete the entire row, or just the substring that is duplicated?


1. I am using =SUMPRODUCT(--ISNUMBER(SEARCH(A2,A:A)))>1 to determine which should be marked true. What is happening in this whole process is that I used

=IFERROR(INDEX($L$1:$L$228,SMALL(IF(ISNUMBER(SEARCH("|"&$B2,"|"&$L$1:$L$228)),
ROW($L$1:$L$228)-ROW($L$1)+1),COLUMNS($E$2:E2))),"")

to look for model numbers in a column against a range of values and return each instance of those values. So model number abc would return image1abc.jpg, image2abc.jpg image3abc.jpg etc. HOWEVER, in the rare instance when one model number includes another, like abcd includes abc, then all abcd results are return to abc as well. The SUMPRODUCT formula tells me when abc is contained within another cell, in this case abcd, and marks as true. In this way we can know that abc will contain the correct results as well as bad ones. And in that way we know to remove the duplicates for abc as opposed to abcd.

2. I don't see macro 2 or 3 doing anything, that's why I asked. I didn't know if the dialog box indicated something wasn't running correctly. I think that happened to me in the past but I can't remember.


3. I just want to delete duplicate strings. I just said "row" as shorthand for "duplicate strings in that row". But I guess shorthand is bad in this instance.

joshman1088
09-07-2016, 11:59 AM
Paul, it only just now occurs to me that dedup is calling the others, so all I have to do is run dedup and that's it, right? Regardless, I still have the issues I posted in 1. as well as I don't see anything being searched against and removed from E:E? So if E2 is DUP of E3 then nothing happens

SamT
09-07-2016, 12:17 PM
My very bad. :crying:
I based that code off the code you posted in #2: Terms = Split(Item, ":")
I replaced the Colons in this code with semicolons and add a check to remove leading semicolons.

Why didn't you catch that use of colons?


Sub DeleteDuplicatesIn_E()


Dim Found As Range
Dim CriteriaRng As Range
Dim CriteriaCel As Range
Dim FirstCriteriaAddress As String
Dim StrCel As Range
Dim Term As Variant
Dim Terms As Variant
Dim Wks As Worksheet

'SpeedCode True '<<<<<<<<<<<<<<<<<<<<<Uncomment after testing

Set Wks = ThisWorkbook.ActiveSheet
With Wks

Set CriteriaRng = Intersect(Range("O:O"), .UsedRange)
Set CriteriaCel = CriteriaRng.Cells(1)
If CriteriaCel <> "TRUE" Then _
Set CriteriaCel = CriteriaRng.Find(What:="TRUE", After:=CriteriaCel, SearchDirection:=xlNext)
FirstCriteriaAddress = CriteriaCel.Address

Do
'****************For test puroposes only:
CriteriaCel.Interior.ColorIndex = 4
'****************
Set StrCel = .Cells(CriteriaCel.Row, "E")
If StrCel = "" Then GoTo NextCriteriaCel

Terms = Split(StrCel, ";")

If Not IsArray(Terms) Then
Set Found = Range("D:E").Find(What:=Terms, After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found.Address = StrCel.Address Then
StrCel = ""

'****************For test puroposes only:
CriteriaCel.Interior.ColorIndex = 3
'****************

End If
Else
For Each Term In Terms
Set Found = Range("D:E").Find(What:=Term, After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found.Address = StrCel.Address Then
StrCel = Replace(StrCel, Term, "")
StrCel = Replace(StrCel, ";;", ";")
If Right(StrCel, 1) = ";" Then StrCel = Left(StrCel, Len(StrCel) - 1)
If Left(StrCel, 1) = ";" Then StrCel = Mid(StrCel, 2)

'****************For test puroposes only:
CriteriaCel.Interior.ColorIndex = 3
'****************

End If
Next Term

End If

NextCriteriaCel:
Set CriteriaCel = CriteriaRng.Find(What:="TRUE", After:=CriteriaCel, _
SearchDirection:=xlNext)
Loop While CriteriaCel.Address <> FirstCriteriaAddress
End With
SpeedCode
End Sub

joshman1088
09-07-2016, 12:21 PM
Sam, I'm getting a runtime error 91, object variable or with block variable not set

ON:

If Not Found.Address = StrCel.Address Then

?
Good call anticipating the leading semicolons, I thought about that and you save me a step if you can remove them

joshman1088
09-07-2016, 12:23 PM
To all, I can circumvent the "TRUE" issue by sorting Z-A prior to this process, so you don't have to take that into effect..... which just occurred to me....you just have to delete the duplicates after the first instances..

Paul_Hossler
09-07-2016, 12:26 PM
Sam, I'm getting a runtime error 91, object variable or with block variable not set

ON:

If Not Found.Address = StrCel.Address Then

?
Good call anticipating the leading semicolons, I thought about that and you save me a step if you can remove them




If Found.Address <> StrCel.Address Then

Paul_Hossler
09-07-2016, 12:31 PM
Paul, it only just now occurs to me that dedup is calling the others, so all I have to do is run dedup and that's it, right? Regardless, I still have the issues I posted in 1. as well as I don't see anything being searched against and removed from E:E? So if E2 is DUP of E3 then nothing happens

Just run de-dup

Col E is 'special' since it can have multiple semicolon strings

This bit of code is in dedup



'check for dups in col E and remove
With rData.Columns(5)
For Each rCell In .Cells
Set rNext = .Cells(rCell.Row + 1, 1)
Set rLast = rNext.End(xlDown)

v = Split(rCell.Value, ";")

For i = LBound(v) To UBound(v)
Call Range(rNext, rLast).Replace(v(i) & ";", vbNullString, xlPart)
Call Range(rNext, rLast).Replace(v(i), vbNullString, xlPart) ' no ; on last one
Next I
Next
End With


What it does is take each col E string and splits it into individual substrings at the semicolon

Then it takes each of the substrings and goes through the rest of col E, replacing that substring if it can

SamT
09-07-2016, 12:52 PM
Sam, I'm getting a runtime error 91, object variable or with block variable not set

ON:

If Not Found.Address = StrCel.Address Then
Heh! I thought about just that before opened this thread

It has to do with the String already being deleted from StrCel.

Replace this section with these additions

For Each Term In Terms
Set Found = Range("D:E").Find(What:=Term, After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found Is Nothing Then '<<<<<<<<<<<<<<<<<<<<<
If Not Found.Address = StrCel.Address Then
StrCel = Replace(StrCel, Term, "")
StrCel = Replace(StrCel, ";;", ";")
If Right(StrCel, 1) = ";" Then StrCel = Left(StrCel, Len(StrCel) - 1)
If Left(StrCel, 1) = ";" Then StrCel = Mid(StrCel, 2)

'****************For test puroposes only:
CriteriaCel.Interior.ColorIndex = 3
'****************

End If
End If'<<<<<<<<<<<<<<<<<<<<<<<<<<<
Next Term

SamT
09-07-2016, 12:54 PM
To all, I can circumvent the "TRUE" issue by sorting Z-A prior to this process, so you don't have to take that into effect..... which just occurred to me....you just have to delete the duplicates after the first instances..

Very similar code. The difference will be a loss of speed. If it takes a while to add the TRUEs, then it would be worthwhile. The choice is yours.

SamT
09-07-2016, 01:05 PM
Sort first. Uses columns B to E

Sub NoTRUEs_DeleteDuplicatesIn_E()

Dim Found As Range
Dim StrCel As Range
Dim Term As Variant
Dim Terms As Variant

SpeedCode True

With ThisWorkbook.ActiveSheet
For Each StrCel In Intersect(.UsedRange, .Range("E:E"))
If StrCel = "" Then GoTo NextStrCel

Terms = Split(StrCel, ";")

If Not IsArray(Terms) Then
Set Found = Range("B:E").Find(What:=Terms, After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found.Address = StrCel.Address Then StrCel = ""

Else
For Each Term In Terms
Set Found = Range("B:E").Find(What:=Term, After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found Is Nothing Then
If Not Found.Address = StrCel.Address Then
StrCel = Replace(StrCel, Term, "")
StrCel = Replace(StrCel, ";;", ";")
If Right(StrCel, 1) = ";" Then StrCel = Left(StrCel, Len(StrCel) - 1)
If Left(StrCel, 1) = ";" Then StrCel = Mid(StrCel, 2)
End If
End If
Next Term
End If

NextStrCel:
Next
End With
SpeedCode
End Sub

joshman1088
09-07-2016, 01:07 PM
Sam, now I'm getting a "next without for" on the last End If.

Paul, ok I see now. It seems to be working as described. Not sure where my brain was.

Now in reality after running more tests it occurs to me what I need is to match all B:E against B:E. So simply if any duplicate in b(x) c(x) d(x) or e(x) is found in any other cell further down the sheet, delete the duplicate instances regardless of match. In that manner if B2 has a duplicate in and E4 cell then the string in E4 will be deleted. And if any strings in cells in B,C or D are duplicated from an E cell then the duplicates in B,C,D will be deleted. That is not currently happening.

Sorry for both not understanding and not foreseeing this. Not only is the coding new to me but the excel work needing to be automated is new to me as well.

If we can amend that I think that will do it

SamT
09-07-2016, 01:10 PM
BTW, If you interrupt the code, run this to reset the Application

Sub reset()
SpeedCode
End Sub

SamT
09-07-2016, 01:32 PM
Sam, now I'm getting a "next without for" on the last End If.
In VBA, in the Tools >> Options Menu, check all the boxes on the Editor tab in the "Code Settings" Frame.
Run the code and when it breaks, hover the mouse over StrCel.Address. What is the address? All I have to work with is the original 19 rows.

Can you upload 1000 rows? Use the Go Advanced button and below the Advanced Editor will be a Manage Attachments button.


And if any strings in cells in B,C or D are duplicated from an E cell then the duplicates in B,C,D will be deleted
WHAT!

Go thru the image column and deleted everything in it, (after the "+",) that is found anywhere
Go thru the small_image column and deleted everything in it, (after the "+",) that is found anywhere
Go thru the Thumbnail column and deleted everything in it, (after the "+",) that is found anywhere
Finally, go thru the media_gallery column and deleted everything in it that is found anywhere.

You really need to upload a new sample sheet, because the old one doesn't have all that.

joshman1088
09-07-2016, 01:48 PM
Can you upload 1000 rows? Use the Go Advanced button and below the Advanced Editor will be a Manage Attachments button.

Got a little lost on the debug, here is an actual sheet of over 1000 rows. In reality, after all this, my actually needs are as posted before:

Now in reality after running more tests it occurs to me what I need is to match all B:E against B:E. So simply if any duplicate in b(x) c(x) d(x) or e(x) is found in any other cell further down the sheet, delete the duplicate instances regardless of match. In that manner if B2 has a duplicate in and E4 cell then the string in E4 will be deleted. And if any strings in cells in B,C or D are duplicated from an E cell then the duplicates in B,C,D will be deleted. That is not currently happening.



1704617046

Again, thanks for all your help. In the attachment, E453 has duplicates as found in B,C,D452. The string in E453 should be removed.

SamT
09-07-2016, 01:55 PM
Sub Delete_All_DuplicatesIn_BtoE()

Dim Found As Range
Dim StrCel As Range
Dim Term As Variant
Dim Terms As Variant
Dim C As Long

SpeedCode True


For C = 2 To 5
With ThisWorkbook.ActiveSheet
For Each StrCel In Intersect(.UsedRange, .Columns(C)).Offset(1)
If StrCel = "" Then GoTo NextStrCel

Terms = Split(StrCel, ";")

If Not IsArray(Terms) Then 'only 1 string in cell
Set Found = Range("D:E").Find(What:=Mid(Terms, 2), After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found.Address = StrCel.Address Then StrCel = ""

Else
For Each Term In Terms
Set Found = Range("B:E").Find(What:=Mid(Term, 2), After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found Is Nothing Then
If Not Found.Address = StrCel.Address Then
StrCel = Replace(StrCel, Term, "")
If Len(StrCel) < 3 Then 'Nothing but semicolon(s)
StrCel = ""
Exit For
End If
StrCel = Replace(StrCel, ";;", ";")
If Right(StrCel, 1) = ";" Then StrCel = Left(StrCel, Len(StrCel) - 1)
If Left(StrCel, 1) = ";" Then StrCel = Mid(StrCel, 2)
End If
End If
Next Term
End If

NextStrCel:
Next StrCel
End With
Next C
SpeedCode
End Sub

joshman1088
09-07-2016, 02:02 PM
Sam, that seems to have removed B and C columns entirely (but left row 20).

SamT
09-07-2016, 02:04 PM
And if any strings in cells in B,C or D are duplicated from an E cell then the duplicates in B,C,D will be deleted

In the attachment, E453 has duplicates as found in B,C,D452. The string in E453 should be removed.
That first statement contradicts the second. There will be no duplicates in B,C, or D by the time the code gets to E

My last posted code ran thru your last upload in less than 10 seconds. But used the logic in the first quoted statement above.

joshman1088
09-07-2016, 02:15 PM
That first statement contradicts the second. There will be no duplicates in B,C, or D by the time the code gets to E

My last posted code ran thru your last upload in less than 10 seconds. But used the logic in the first quoted statement above.

Funny, it took me about a minute. Yes let me clarify, I am using the logic posted by paul where only duplicates below the original are removed. The first quote

"And if any strings in cells in B,C or D are duplicated from an E cell then the duplicates in B,C,D will be deleted"

Should read:

Any B,C,D string that is a duplicate of any string in E ABOVE, then they should be removed. So if B5,C5,D5 are duplicates of E2 then remove said B,C,D strings.

I just meant to clarify that any cell in B:E can be used to delete duplicates in any other as long as we are deleting the duplicate which is in a lower row as the first instance. I'm getting tired, maybe I should stop before I keep writing things incorrectly. But I think this should accurately explain it?

SamT
09-07-2016, 02:19 PM
They were duplicates.

Row 20: the ~ does strange things. Excel Edit>>Find menu can't use it either

SamT
09-07-2016, 02:21 PM
"And if any strings in cells in B,C or D are duplicated from an E cell then the duplicates in B,C,D will be deleted"

Should read:

Any B,C,D string that is a duplicate of any string in E ABOVE, then they should be removed. So if B5,C5,D5 are duplicates of E2 then remove said B,C,D strings.

I just meant to clarify that any cell in B:E can be used to delete duplicates in any other as long as we are deleting the duplicate which is in a lower row as the first instance.
Sort each column descending.

Yeah. I'll see you later. My brain is mush, and I have to go finish the turkey pen.

Paul_Hossler
09-07-2016, 05:16 PM
I'm still using .Replace instead of .Find

Your 'Over 1000' goes through in 2-3 seconds but this is based on my understanding of the requirements

1. Remove dups in B:D
2. If any cell in B:D is a substring in E, delete that substring
3. If any substring is duplicated in E, delete the dups


The 1000_ph worksheet is the output of the macro



Option Explicit

Sub DeDup()
Dim rData As Range, rCell As Range, rNext As Range, rLast As Range, rColumnsBD As Range, rColumnE As Range
Dim s As String
Dim v As Variant
Dim i As Long
Application.ScreenUpdating = False
Set rData = Worksheets("firstoutputtest_ph").Cells(1, 1).CurrentRegion
Set rColumnsBD = Intersect(rData, rData.Parent.Range("B:D"))
Set rColumnE = Intersect(rData, rData.Parent.Range("E:E"))
'tildas need special treatment for replace
Call rData.Replace("~~", Chr(160), xlPart)
'removes dups from A:D
Call CheckForDups_1(rColumnsBD)

'removes text in A:D from E
Call CheckForDups_2(rColumnsBD, rColumnE)
'removes dup substrings from E
Call CheckForDups_3(rColumnE)
'tildas need special treatment for replace
Call rData.Replace(Chr(160), "~", xlPart)

Application.ScreenUpdating = True
End Sub


Private Sub CheckForDups_1(riN As Range)
Dim r As Long, c As Long

With riN
For r = 2 To .Rows.Count
For c = 1 To .Columns.Count - 1
If Len(.Cells(r, c).Value) > 0 Then
If Application.WorksheetFunction.CountIf(.Rows(r), .Cells(r, c).Value) > 1 Then
Call Range(.Cells(r, c + 1), .Cells(r, .Columns.Count)).Replace(.Cells(r, c).Value, vbNullString, xlWhole)
If r <> .Rows.Count Then
Call .Rows(r + 1).Replace(.Cells(r, c).Value, vbNullString, xlWhole)
End If
End If
End If
Next c
Next r
End With
End Sub


Private Sub CheckForDups_2(rSingle As Range, rMultiple As Range)
Dim rValues As Range, rCell As Range
Dim r As Long, c As Long
Dim s As String

On Error Resume Next
Set rValues = rSingle.SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0

If rValues Is Nothing Then Exit Sub

For Each rCell In rValues.Cells
If Left(rCell.Value, 1) = "+" Then
s = Right(rCell.Value, Len(rCell.Value) - 1)
Else
s = rCell.Value
End If

If Len(s) > 1 Then
Call rMultiple.Replace(s & ";", vbNullString, xlPart)
Call rMultiple.Replace(s, vbNullString, xlPart)
End If
Next
End Sub


Private Sub CheckForDups_3(rMultiple As Range)
Dim rNext As Range, rLast As Range, rCell As Range
Dim v As Variant
Dim i As Long

With rMultiple
For Each rCell In .Cells
Set rNext = .Cells(rCell.Row + 1, 1)
Set rLast = rNext.End(xlDown)
v = Split(rCell.Value, ";")
For i = LBound(v) To UBound(v)
Call Range(rNext, rLast).Replace(v(i) & ";", vbNullString, xlPart)
Call Range(rNext, rLast).Replace(v(i), vbNullString, xlPart) ' no ; on last one
Next I
Next
End With
End Sub

Paul_Hossler
09-07-2016, 05:19 PM
They were duplicates.

Row 20: the ~ does strange things. Excel Edit>>Find menu can't use it either

You have to use ~* or ~~ or ~? in the FindWhat since * and ~ and ? are wildcards




Microsoft Excel uses the tilde (~) as a marker to indicate that the next character is a literal. When you use the Find and Replace dialog box to find or replace a character such as a tilde (~), an asterisk (*), or a question mark (?), you must add a tilde (~) before the character in the Find what box.

Note If you want to find or replace a tilde in a worksheet, you must type a double tilde (~~) in the Find what box.

SamT
09-08-2016, 07:46 AM
To remove from E all duplicates found anywhere
To remove from D all duplicates found in B, C, and D
To remove all duplicates from C found in B and C
To remove all Duplicates from B found in B

11 seconds on over 1000 rows in Excel XP

Option Explicit

Sub NoDupesFrom_E2B()

Dim Found As Range
Dim StrCel As Range
Dim SearchCols As Range
Dim Term As Variant
Dim Terms As Variant
Dim C As Long

SpeedCode True
ActiveSheet.UsedRange.Replace "~~", Chr(160), xlPart 'Thanks, Paul

For C = 5 To 2 Step -1
With ThisWorkbook.ActiveSheet
Set SearchCols = Intersect(.UsedRange, .Range("B:E").Resize(, C - 1)).Offset(1)

For Each StrCel In Intersect(SearchCols, .Columns(C))

If StrCel = "" Then GoTo NextStrCel
Terms = Split(StrCel, ";")

If Not IsArray(Terms) Then 'only 1 string in cell
Set Found = Range("D:E").Find(What:=Mid(Terms, 2), After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found.Address = StrCel.Address Then StrCel = ""

Else
For Each Term In Terms
Set Found = SearchCols.Find(What:=Mid(Term, 2), After:=StrCel, SearchDirection:=xlNext, SearchOrder:=xlByColumns)
If Not Found Is Nothing Then
If Not Found.Address = StrCel.Address Then
StrCel = Replace(StrCel, Term, "")
If Len(StrCel) < 3 Then 'Nothing but semicolon(s)
StrCel = ""
Exit For
End If
StrCel = Replace(StrCel, ";;", ";")
If Right(StrCel, 1) = ";" Then StrCel = Left(StrCel, Len(StrCel) - 1)
If Left(StrCel, 1) = ";" Then StrCel = Mid(StrCel, 2)
End If
End If
Next Term
End If

NextStrCel:
Next StrCel
End With
Next C
ActiveSheet.UsedRange.Replace Chr(160), "~", xlPart
SpeedCode
End Sub


Public Sub SpeedCode(Optional Faster As Boolean)
Static CalcSetting As XlCalculation
If CalcSetting = 0 Then CalcSetting = Application.Calculation

If Faster Then
With Application
CalcSetting = .Calculation
.ScreenUpdating = False
.EnableEvents = False
.Calculation = xlCalculationAutomatic
End With
Else
With Application
.ScreenUpdating = True
.EnableEvents = True
.Calculation = CalcSetting
End With
CalcSetting = 0
End If
End Sub

SamT
09-08-2016, 08:00 AM
Note that removes dupes from top down, so sort in reverse order

joshman1088
09-09-2016, 08:20 AM
Thank you gentlemen, my brain was fried yesterday and I'm trying to focus on other tasks today, I will give it a closer look this weekend