Consulting

Results 1 to 12 of 12

Thread: Help : Compare texts in two columns in the same row and get the similarity percentage

  1. #1
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location

    Help : Compare texts in two columns in the same row and get the similarity percentage

    Hi,
    Request help from the expert members here for the below situation-

    Is there any way to compare texts in two columns in the same row and get the similarity percentage like Fuzzy Lookup does?. My problem with Fuzzy Lookup is, it checks each cell value in the first table with each cell value in the second table and it is taking very long time to complete as I have 10K+ rows data with long text. Also, I only need to check each cell value in first table with the cell value in second table in the same row. I have attached a sample file here and I need to compare the texts in column B with column D.

    Request your support
    Thanks!
    Attached Files Attached Files
    Last edited by anish.ms; 04-06-2021 at 11:24 AM.

  2. #2
    Moderator VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    A Percent match between two strings
    What

    How about some examples for you to answer
    What is the % Match between:
    1) "A little dog" & "A quick brown Fox... a Dog's back"
    2) "John made $57." & "Sue earned 57 Dollars"

    3) Molestias rem corrupti sint omnis ut alias ex. Maxime mollitia qui et voluptas dolores. Voluptas doloremque corrupti et. Nobis consequatur amet suscipit enim et. Nihil et velit praesentium maiores voluptate vel. Deleniti excepturi expedita consequatur harum saepe."

    &

    Molestias rem corrupti sint omns ut alias ex. Maxime mollitia qui et voluptas dolores. Voluptas doloremqu corrupti et. Nobis consequatur amet suscipit enim et. Nihil et velit praesentium maiores voluptate vel. Delenitu excepturi expedita consequatur harum saepe.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  3. #3
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Hi Sam,
    Below is the similarity I got for your examples using Fuzzy Lookup

    Table1 Table2 Similarity
    A little dog A quick brown Fox... a Dog's back 0.259340659340659
    John made $57. Sue earned 57 Dollars 0.214285714285714
    Molestias rem corrupti sint omnis ut alias ex. Maxime mollitia qui et voluptas dolores. Voluptas doloremque corrupti et. Nobis consequatur amet suscipit enim et. Nihil et velit praesentium maiores voluptate vel. Deleniti excepturi expedita consequatur harum saepe. Molestias rem corrupti sint omns ut alias ex. Maxime mollitia qui et voluptas dolores. Voluptas doloremqu corrupti et. Nobis consequatur amet suscipit enim et. Nihil et velit praesentium maiores voluptate vel. Delenitu excepturi expedita consequatur harum saepe. 0.995553383087073

    I really don't know the method used in Fuzzy Lookup to arrive at these similarity.

  4. #4
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,872
    In Power Query there is a fuzzy match option and, as I discovered, a similarity score too, although on mine, it was only available through editing the M-code (at least I couldn't find anything in the user-interface).
    Originally the M-code was:
    = Table.FuzzyNestedJoin(Table1, {"PDC_DESC"}, Table2, {"LONG_DESCRIPTION"}, "Table2", JoinKind.LeftOuter, [IgnoreCase=true, IgnoreSpace=true, Threshold=0.6])
    and needed editing to:
    = Table.FuzzyNestedJoin(Table1, {"PDC_DESC"}, Table2, {"LONG_DESCRIPTION"}, "Table2", JoinKind.LeftOuter, [IgnoreCase=true, IgnoreSpace=true, Threshold=0.6, SimilarityColumnName="Similarity score"])
    and then expanded the new column. (Edit the settings in the user interface and you'll lose your manual additions)
    and got:
    2021-04-06_203953.jpg
    See also:
    Fuzzy Matching - Scores
    https://social.technet.microsoft.com...atching-scores
    (see the last msg on the page)

    Table.FuzzyJoin - PowerQuery M | Microsoft Docs
    https://docs.microsoft.com/en-us/pow...able-fuzzyjoin

    The time it takes may depend on the similarity threshold you choose.
    Attached Files Attached Files
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  5. #5
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,724
    Location
    Lots of VBA fuzzy algorithms out there

    Are you just testing between two strings in the same row?

    https://www.mrexcel.com/board/thread...nation.195635/

    My small sample test workbook and yours

    Capture.JPG

    Capture1.jpg
    Attached Files Attached Files
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  6. #6
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks p45cal!

    I'm getting the below message when I go to power query editor. I'm using office 365 with latest update. May be this is available only for Office Insiders.
    cw.jpg

  7. #7
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks a lot Paul!

  8. #8
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,872
    I'm getting the below message when I go to power query editor. I'm using office 365 with latest update. May be this is available only for Office Insiders.
    I'm not an 'insider'. I usually ignore this message and everything's fine.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  9. #9
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    I ignored the message and the error I'm getting is
    Expression.Error: '' isn't a valid Table.FuzzyNestedJoin option. Valid options are:
    ConcurrentRequests, Culture, IgnoreCase, IgnoreSpace, NumberOfMatches, Threshold, TransformationTable

    I tried to do it fresh from in my sample file by opening a blank query, but I'm getting the following error
    Expression.Error: The name 'Table1' wasn't recognized.  Make sure it's spelled correctly.
    I tried after renaming the table name

  10. #10
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,872
    A few things to try:
    With the files I sent:
    1. Attached is the same file as I attached in msg#4 but as an xlsx file; I have no idea whether this will make a difference.
    2. Confirm that if the question pops up External connections have been disabled you click the Enable Content button.
    3. Confirm that in the result table at cell H2 in either the attached file or the xlsb file refreshes properly 'out of the box' (Clear a few cells in the result table and then right-click it and choose Refresh - it should, eventually, refill those cells).
    With your file:
    4. Try saving it as an xlsx file (again, I'm not sure this will have any effect at all).
    5. Attach it here with your attempt so far. (If it's too big, reduce the source data tables in size while making sure there are some reasonably similar rows, and at least one exactly similar row.)
    Attached Files Attached Files
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  11. #11
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    1. I tried with the file you shared in #10
    2. Pop up came for External connections and I clicked Enable Content
    3. I deleted similarity score form cells K8 and K9 and tried to refresh. Following is the message that I'm getting
    Screenshot 2021-04-07 225258.jpg

    Screenshot 2021-04-07 225456.jpg

  12. #12
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,872
    Could you try again with this one? There was something a bit odd about the last one (you should be able to right-click on the results table, choose Table, then Edit Query…, and you couldn't on the last one).

    If it's still a problem, save this file on your own system with a new name, close it and attach it here.
    Attached Files Attached Files
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •