Consulting

Results 1 to 20 of 20

Thread: Excel 2016 VBA remove duplicates by multiple AND/OR criteria

  1. #1
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location

    Excel 2016 VBA remove duplicates by multiple AND/OR criteria

    I’m just starting with VBA, but after spending a day looking at remove duplicates/values VBA tutorials I still cannot come up with a way to do this task.

    I need to remove duplicates from a list based on several criteria.

    First, identify duplicate record numbers (column A). Then delete the entire row of the duplicate(s) (there may be several instances of the same duplicate) except were the values at column S are different AND both (all) values belong to a predefined set, OR the values at column X are different (and the value of the duplicate is any value).

    If the value of the duplicate is from set 1, and the value of its counterpart(s) is not from set 1, it is the latter that has to be deleted.

    Not trying to delete both (all) duplicates! 1 value should remain.

    Where the Location value is a town in England (London, Leeds), duplicates should be kept. Where the value in the Cost column is different, duplicates should be kept. If one of the duplicates has a Location value that is a town in England (London, Leeds), and the others do not, the others are the ones to be deleted.

    A S X
    No Location Cost
    2 Glasgow 200
    2 London 200
    2 Leeds 200
    2 Glasgow 100

    returns

    A S X
    No Location Cost
    2 London 200
    2 Leeds 200
    2 Glasgow 100

    Thanks in advance for any suggestions!

  2. #2
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Can you post a workbbok showing before and after and any clarification. Go Advanced/Manage Attachments
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  3. #3
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Let me try again, I have probably confused myself.


    In the example attached Where the Location value is a town in England (London, Leeds), duplicates should be kept. Where the value in the Cost column is different, duplicates should be kept. If one of the duplicates has a Type value =Working, and the others do not, then the Working, and other duplicate do not, than the other duplicates are the ones to be deleted.

    Please see attached workbook for a better illustration.

    Thank you
    Attached Files Attached Files

  4. #4
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Why delete Row 9 instead of 13?
    Do you have a list of England towns in your workbook?
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  5. #5
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Quote Originally Posted by mdmackillop View Post
    Why delete Row 9 instead of 13?
    Do you have a list of England towns in your workbook?
    Row 13 has a different cost value.

    Also, the rows are longer than in the example, but the rest of the values in other columns do not matter for sorting purposes.

    I can either have a list in the workbook itself, or type it into VBA...

  6. #6
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Row 13 has a different cost value.
    They are both different! You need rules. Is it the first, last, greatest or lowest? What if there are 2 @ 200 and 1 @ 100? A good sample of data should encompass your actual case to avoid misunderstanding.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  7. #7
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Yes, sorry, they are, but row 9 is deleted because it 1) duplicates the record number 2) does not have a different location with a town in England 3) duplicates the cost value. Row 13 is kept because it does not satisfy 3).

    First, duplicate record numbers are identified, and all subsequent operations are performed among duplicate record numbers only. Duplicates that are locations outside England are merged. If the duplicates have location values both in England and outside England, those NOT in England are deleted. If duplicates only have location values in England, they are merged if they do NOT have a different cost value. If the duplicates have a different type value, those NOT with working type value are deleted.

    Do these resemble rules?

    Thank you very much for your time

  8. #8
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Duplicates that are locations outside England are merged.
    I don't understand this. Please expand your sample to demonstrate.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  9. #9
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Here I expanded it and tried putting it in some sort of logical structure.

    Again, massive thank you for persistance.

    Please see the file sample workbook.xlsx (32.6 KB), found a mistake in the other one.
    Attached Files Attached Files
    Last edited by Edmon; 07-14-2017 at 06:02 AM. Reason: mistake

  10. #10
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Should Glasgow not show in the result?
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  11. #11
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Yes, it should not show, it is eliminated by rule 1). Belfast does show, however, since rule 1) does not apply to duplicate record number 5 - only rule 2) does.

  12. #12
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Before coding is my interpretation correct? How about Row 18?
    Attached Files Attached Files
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  13. #13
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Please see attached for clarifications, thank you!
    Attached Files Attached Files

  14. #14
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Try this
    Attached Files Attached Files
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  15. #15
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Runs perfectly on Windows, excellent, thank you very much!

    On Mac it Returns error 429 'ActiveX component can't create object'. Debug points to code line Set dic = CreateObject("Scripting.Dictionary")

    Any suggestions for Mac users?

    Also, for learning purposes, which part of the code refers to finding duplicate record numbers?
    Last edited by Edmon; 07-15-2017 at 11:13 AM.

  16. #16
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    This section of code creates range areas stored in a dictionary object. These areas are then passed to the other sub for processing.
    For Each c In r
            If Not dic.exists(c.Value) Then
                dic.Add c.Value, c
            Else
                Set rTemp = dic(c.Value)
                Set rTemp = Union(rTemp, c)
                Set dic(c.Value) = rTemp
            End If
            Set rTemp = dic(c.Value)
            'rTemp.Select
        Next
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  17. #17
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    Can't wrap my head around this bit

     'Test 2
        'Delete repeated occurences over 3 columns
        For i = 1 To r.Rows.Count
            For j = i + 1 To r.Rows.Count
                r(i, 1).Interior.ColorIndex = 6    'debug
                r(j, 1).Interior.ColorIndex = 7    'debug
                If r(i, 1) & r(i, 2) & r(i, 3) = r(j, 1) & r(j, 2) & r(j, 3) Then r(j, 5) = "x"
                Range("F:F").Interior.ColorIndex = xlNone    'debug
            Next j
        Next i
    End Sub
    What is the purpose of the .Interior.ColorIndex lines in this?

  18. #18
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    I commonly use colour to show me the cells being processed as the code runs and to demonstrate the affected cells. The lines marked "Debug" can be deleted.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  19. #19
    VBAX Regular
    Joined
    Jul 2017
    Posts
    11
    Location
    According to rule 3: IF all duplicates have the same B (location) value, and the same D (cost) value, only the first record is kept (highest on the list).

    Why is then that the line from the section above is

    If r(i, 1) & r(i, 2) & r(i, 3) = r(j, 1) & r(j, 2) & r(j, 3) Then r(j, 5) = "x"and not

    If r(i, 1) & r(i, 2) & r(i, 4) = r(j, 1) & r(j, 2) & r(j, 4) Then r(j, 5) = "x"

    Also, why is the result
    Then r(j, 5) = "x"? What is the meaning of cell address r(j, 5)?


  20. #20
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Runs perfectly on Windows, excellent, thank you very much!
    Has this stopped working?

    Why is then that the line from the section above is
    If r(i, 1) & r(i, 2) & r(i, 3) = r(j, 1) & r(j, 2) & r(j, 3) Then r(j, 5) = "x"and not
    If r(i, 1) & r(i, 2) & r(i, 4) = r(j, 1) & r(j, 2) & r(j, 4) Then r(j, 5) = "x"
    Because
    Column E (time) represents other columns that are irrelevant for this sorting.
    Also, why is the result Then r(j, 5) = "x"? What is the meaning of cell address r(j, 5)?
    It provides data for a filter.

    You should step through the code using Watch Values and Debug.Print to send values to the Immediate window. Add comments to the code that detail what each step does. These are the steps I have to follow to understand complex code submitted by posters in order to respond
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •