Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 22

Thread: Identify Matches between Sheet Columns

  1. #1
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location

    Identify Matches between Sheet Columns

    This may sound ridiculously complicated, but here a stab.
    Looking for a practical/proficient process to identify matches between data in two sheet columns. For example lets say I have Sheet 1 with columns A and B and 300,000 rows.


    Ref# Folder Path

    1 C:\Folder 1\Test.png
    ...

    243 C:\Folder 2\Subfolder 1\Testing.jpg

    ... .....

    300000 E:\Folder1\SubFolder2\Test.doc





    I have Sheet 2 with 5 columns and 100 rows


    SN Type Title Date Name

    1-23 A Dogs 1//1/25 Test.doc
    ....

    35-456 A Cats 1/2/24 Test.png

    ...
    100-D-3 B Pigs 1/4/20 Testing.jpg


    I need to create a third sheet that gets the data from Sheet 1 Column A where the data in Sheet 2 Column 5 "Name" is "found" in Sheet 1 Column B.

    So in the example. Sheet 3 would look like this:



    A
    1 35-456 Cats 1/2/24 Test.png
    243 100-D-3 Pigs 1/4/20 Testing.jpg
    300000 1-23 Dogs 1/1/25 Test.doc


    My initial process was like this
    Loop through each row in Sheet 1 (300,000)
    Loop through each row in Sheet 2 (100)
    Check InStr(Sheet1.Cell, Sheet2,Cell) > 1
    Yes we have a match, develop Sheet3 row
    This process takes about a minute to run.


    I revised the process to:
    Create a collection

    Loop through each row in Sheet 2 (100)
    Check if Sheet2.Column 5 is Found in Sheet1
    Yes
    For Each find
    Add row index to collection
    For Each Index in Collection
    Develop Sheet3 row
    This process takes about 12 seconds.


    Either method is working, but the potential exists for the row counts in sheet 1 or sheet 2 to grow considerably and the only result is longer processing time.


    Wondering if there is a better way. I know nothing about Power Query and very little about Excel. Hoping some expert knows a more efficient process to minimize looping to identify all row indexes in sheet 1 that requires processing?


    Thanks







    cross posted at: https://www.msofficeforums.com/excel...t-columns.html
    Greg

    Visit my website: http://gregmaxey.com

  2. #2
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    In the attached, Power Query. A guess.
    Hopefully Sheets 1 to 3 are close to what you have and want.
    Right-click the (currently blank) table on Sheet3 and choose Refresh.

    If you mock up a workbook with a bit more sample information with (some) matches between Sheet1 and Sheet2, together with a third Sheet3 containing expected results and attach it here, we can put together something which will more likely work for you.
    Attached Files Attached Files
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  3. #3
    Administrator VBAX Master georgiboy's Avatar
    Joined
    Mar 2008
    Location
    Kent, England
    Posts
    1,278
    Location
    It doesn't sound complicated, however, 'my' answer may not be what you want to hear...

    I use VBA a lot however, when dealing with larger datasets and matches, I tend to use Power Query.

    My logic for the attached PQ file is:
    1, Put all of your data from both sheet1 and sheet2 into table objects and not just data in a range.
    2, Load table1 into PQ as a connection only and separate the file name from the folder path as a new column. This is done by splitting the 'Folder Path' column based on the rightmost delimiter of '\'.
    3, Load table2 into PQ as a connection only.
    4, Merge table2 with table1 based on the file name.
    5, Expand the merged table and select only the 'Ref#' column.
    6, Export the completed merge as a table to a new sheet.

    PQ is super efficient and has replaced a lot of my use for VBA, IMO it is worth learning some PQ methods. ChatGPT can be helpful when learning how to complete specific tasks in PQ, for example you could ask ChatGPT "How to split a column based on a delimiter with Power Query" or "How to merge two tables using Power Query"

    I have a attached the file for reference just in case you wanted to pick through it.
    Attached Files Attached Files
    Click here for a guide on how to add code tags
    Click here for a guide on how to mark a thread as solved
    Click here for a guide on how to upload a file with your post

    Excel 365, Version 2408, Build 17928.20080

  4. #4
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    Georgoboy\p45Cal

    The PQ.xlsx shows exactly what I am currently doing with VBA (95% of the time). So my first step is to figure out how to do Georgboy's steps 1 through 6 using VBA.

    95% of the time the conditions are:
    a. Return only the first match in Sheet 1 and b. Discard all Sheet 1 rows where match is not found
    The other 5% of the time, there are some conditionals.
    a. For any second or subsequent match, create new row in Sheet 3
    b. For any second or subsequent match, concatenate any second or subsequent match data in existing Sheet 3 row
    c. If a match is not found for any row in Sheet 1, cancel and exit process
    d. If a match for any row in Sheet 1, create a new row in Sheet 3 showing only Sheet 1 column 1 data
    e. If a match is not found for any row in Sheet 2, cancel and exit process

    So, before trying to climb the mountain and learn how to perform georgiboy's steps, do you feel these objectives could also be meet?

    I'll try to put together some sanitized examples and post.

    Thank you so much!!!
    Greg

    Visit my website: http://gregmaxey.com

  5. #5
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    Quote Originally Posted by gmaxey View Post
    So, before trying to climb the mountain and learn how to perform georgiboy's steps, do you feel these objectives could also be meet?
    Yes
    Quote Originally Posted by gmaxey View Post
    I'll try to put together some sanitized examples and post.
    Try to include at least one instance of your a to e points in your sample data and how they should appear on Sheet3.

    Mine and georgiboy's queries were remarkably similar.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  6. #6
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    p45Cal,

    Here is the sample file (with code to create Sheet 3 named Overlay).
    I have included the overlay results for Normal conditions, Match Dups as New Rows, Match Dup Concatenated. The other conditions can be seen by running the code.
    Examples.jpg


    If you are uncomfortable opening the attached and I can send an xlsx version.

    Overlay Tool Sample Data 2.xlsm

    Overlay Tool Sample Data 2.xlsx
    Last edited by gmaxey; 01-29-2025 at 11:06 AM.

  7. #7
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    I get differing results from you for the Overlay (Dup as New Row) sheet. I only get 4 rows with no duplicates:

    2025-01-29_231055.jpg

    I suspect this could be because you've removed a bunch of rows which might have matched but I'm not 100% sure of that. In order to check could you tell me what settings you used in your frmOverlay userform for each of the 9 available settings therein so that I can run it and compare?

    If this is because you've removed those rows could you re-attach a version with the appropriate rows included?
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  8. #8
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    Also, there is one row on the REF_SHEET (row 32001) which has folder separators of \ rather than /.
    2025-01-29_234727.jpg
    Is this something we must cater for?
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  9. #9
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    I have reattached the file with Overlay sheets for:

    1. Normal use case - Match First use Left side/Discard Ref Row Right Side
    2. Duplicate Ref Rows Left Side/Discard Ref Row Right Side
    3. Concatenate with delim ; Left side/Discard Ref Row Right Side
    4. Concatenate with line break Left side/Discard Ref Row Right side

    Either right or left option to Do not create overlay will result in the overlay sheet not be created
    The right option to Copy Control ID from Ref will result in an overlay has all REF rows (resulting file is too large to attach).

    The YoPauli in the RS_SHEET is what creates the Do Not Create Overlay Cancel with the Left bottom option buttong
    The YoAdrian in the NDS_SHeet is what creates the Do Not Create Overlay Cancel with the Right option button.








    Overlay Tool Sample Data 3.xlsm
    Greg

    Visit my website: http://gregmaxey.com

  10. #10
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    Greg

    Visit my website: http://gregmaxey.com

  11. #11
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    It's bedtime here now (early start tomorrow) so it'll possibly be some 24 to 36 hours before I can come back… unless I get a moment.

    ps. there's still only one #266912.1 in the REF_SHEET, doesn't matter, I'll add one myself.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  12. #12
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    Looking at the picture you sent in your last previous post

    My example is the result when:
    Duplicate Duplicate Reference Row (Left side option) and Discard the Ref Row from Overlay Sheet (Right Side options) are applied.

    You see the rose shaded A:2 and A:3 because row 299 of the REF_SHEET matched rows 24 and 25 of the NDS_SHEET. In the normal use case only the first #2669121 is listed.

    Other than yours seems to exclude both double Serial Number 10 10 and 9 9, I don't know why your return is only four rows.

    Hope this helps. Thank you.
    Greg

    Visit my website: http://gregmaxey.com

  13. #13
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    OK I get it now, I'll have a play…
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  14. #14
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    Tell me in words (might be easier then me looking through about 800 lines of code) about the likes of:
    Sentinel Overlay Demo Data.zip//Sentinel Overlay Demo Data/DemoData_0000015_Import.zip//X Emails/Email 3.pdf
    Should I treat it as 3 possible file names (splitting the text at the double forward slashes), then looking for all three of:
    Sentinel Overlay Demo Data.zip
    DemoData_0000015_Import.zip
    Email 3.pdf

    or something else?
    (It really is bedtime now!)
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  15. #15
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    p45cal,

    Yes, it was past my bedtime. Thanks for hanging in here! Well, as I said, it is complicated. In a sense the answer to your question is yes.
    My first approach to achieve the required out put was to loop through every row of the RS_SHEET

    So when we hit row 299 we have: "Sentinel Overlay Demo Data.zip//Sentinel Overlay Demo Data/DemoData_0000015_Import.zip//X Emails/Email 3.pdf"
    Then I looped through every row of NDS_SHEET and checked to see if the data in column 8 was InStr of the last step
    So when we hit row 24, we have "DemoData_0000015_Import.zip"
    If InStr(
    "Sentinel Overlay Demo Data.zip//Sentinel Overlay Demo Data/DemoData_0000015_Import.zip//X Emails/Email 3.pdf", "DemoData_0000015_Import.zip") > 0
    Bingo, we have first match. record it.
    If Use First Match Only then
    get out
    Else
    We continue looping
    We hit row 25 which again is "DemoData_0000015_Import.zip" and again is InStr so we record the line 25 record

    So if you change NDS_SHEET row 25 column 8 from
    "DemoData_0000015_Import.zip" to "Email 3.pdf" and run another overlay, you will see the same result for a normal overlay. If you run it with the Duplicate Reference Row you will see a different result:


    That required a lot of looping. With some actual data with 440,000 RS_SHEET rows it takes about 45 seconds.

    My current approach (what you see) works like this:

    I loop through each data row of the NDS_SHEET (Rows 5 to 26 rows)
    In each loop, I find any RS_SHEET Row index(es) where the NDS_SHEET row index, column 8 data is found and add to a collection
    So when NDS_SHEET row index = 24 the RS_SHEET Row 299 is the collection item
    I then Loop through the collection (In this case only 1 item)
    I add 299 to another DupCollection. If successful ,
    I add the data for row 299 to the Overlay
    When NDS_SHEET row index = 25 RS_SHEET Row 299 is again the only index returned
    I attempt to add it to the DupCollection and it fails
    If Match First Only - Get Out
    If Duplicate Ref Row - Record it

    With the actual data this process takes about 12 seconds.

    I hope that makes sense. Thank you again.






    Example.jpg
    Greg

    Visit my website: http://gregmaxey.com

  16. #16
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    In the attached, on sheet Overlay (Dup as New Row), there's a Power Query query output at cell A14 which seems to be correct. I've cleared a few cells in that table so that you can right-click somewhere in that table and choose Refresh and see how long it takes for data to appear. The first time you do this it may take longer than subsequent refreshes. I'd be interested to know how long the refreshes take.
    It's very basic at the moment and I should be able to add tables for the other sheets too and make it more streamlined, but I'm out of time right now.
    Note that no macros are involved (I should have attached as .xlsx).
    Attached Files Attached Files
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  17. #17
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    p45cal,

    I had a chance to look at your attached file. If I change some data in your table and refresh, it takes between 2.5 and 3 seconds nominally to refresh. You mention that there is no VBA. I think for this approach to work there will have to be some VBA involved. The process will need to work for any number of REF_SHEETS and NDS_SHEETS. Sometimes the number of headings and titles in the REF_SHEET will vary. On top of all of the various conditions that must be accounted for, there is another wrinkle called "Family Relationships" that occasionally must be handled. With some help offered in the linked post, I've managed to get the processing time down to nearly a order of magnitude below what I did have. The attached processes normal conditions in less than 4 seconds (with family relationships it is still under 30 seconds.

    I want you to be award of this so you don't spend more of your time on an approach that might not be suited for this requirement.
    Up to you. If you like a challenge fine, if you wan to leave it be that is fine of course also. Thank you!!

    Overlay Tool Sample Data BM Ver 1.1.xlsm
    Greg

    Visit my website: http://gregmaxey.com

  18. #18
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,835
    Location
    This may sound ridiculously complicated, but here a stab.
    Looking for a practical/proficient process to identify matches between data in two sheet columns. For example lets say I have Sheet 1 with columns A and B and 300,000 rows.


    Ref# Folder Path

    1 C:\Folder 1\Test.png
    ...
    243 C:\Folder 2\Subfolder 1\Testing.jpg
    ... .....
    300000 E:\Folder1\SubFolder2\Test.doc

    I have Sheet 2 with 5 columns and 100 rows

    SN Type Title Date Name

    1-23 A Dogs 1//1/25 Test.doc
    ....
    35-456 A Cats 1/2/24 Test.png
    ...
    100-D-3 B Pigs 1/4/20 Testing.jpg


    I need to create a third sheet that gets the data from Sheet 1 Column A where the data in Sheet 2 Column 5 "Name" is "found" in Sheet 1 Column B.
    It seems to me (FWIW) that the main question was one of decreasing processing speed??

    Using the attachment in #17 can you describe in words and color coding how (I assume) matching between NDS_Sheet and REF_Sheet works to create OVERLAY?

    If we come up with an acceptable 'match algorithm' the rest of the process to integrate into the UI could be a homework assignment
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  19. #19
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,959
    I only mentioned via because I noticed I'd attached a .xlsm file after posting.
    Sure I'm happy to use it if needed.
    I said that the Power Query query I used was quite basic and indeed it was, with further coding it can be quite sophisticated and flexible.
    You say you will have to handle any number of Ref/Nds sheets; do you mean at the same time?
    I have a further question; the data that are in these sheets, do they come as Excel sheets or in some other form (maybe csv or txt files)? I ask this because Power Query can eliminate the step of bringing those files into a worksheet, what's more it would probably be faster.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  20. #20
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,411
    Location
    p45cal,

    As I informed you via PM, I'm going to mark this thread solved as I have found a solution that will more than meet the need.
    Attachment 31883

    I tried to send you a photo in the PM of the typical column names that are used but it got converted to a xml format. So here it is again.

    Attachment 31885

    Thanks for all you time and help with this!
    Greg

    Visit my website: http://gregmaxey.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •