Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 31

Thread: Help needed to find matching string from the collection.

  1. #1
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location

    Help needed to find matching string from the collection.

    Dear Experts,

    I have attached a sample file from the total population of 500K rows. In the sample file I need to identify the matching diagnosis codes from sheet 'Cohort' (Sh2) for each line item in sheet 'Returned' (Sh1). I have created a code for the same. But a sample highlighted in cell E7 where the codes are matching from Sh2 has not been picked by the code . I couldn't identify the issue, could anyone please help in correcting the code ?
    Thanks in advance for your time.

    Sub ICD_Check()
    
    
        Dim Sh1    As Worksheet, Sh2 As Worksheet
        Dim Sh1_LR As Long, Sh2_LR As Long
        Dim rng1   As Range, r1 As Range, rng2 As Range, r2 As Range
        Dim icd1_Arr As Variant, icd2_Arr As Variant
        Dim x      As Long, y As Long, m As Long, n As Long
        Dim DelSpaces As Variant
    
    
        DelSpaces = Array(" ", Chr(160))              'Chr(160) = special character space
    
    
        Set Sh1 = Sheets("Returned")
        Set Sh2 = Sheets("Cohort")
        Sh1_LR = Sh1.UsedRange.Rows.Count             'sheet 1 last row
        Sh2_LR = Sh2.UsedRange.Rows.Count             'sheet 2 last row
        
        Sh1.Range("C2:C" & Sh1_LR).ClearContents
        
        Set rng1 = Sh1.Range("A2:A" & Sh1_LR)
        Set rng2 = Sh2.Range("A2:A" & Sh2_LR)
        
        For Each r1 In rng1
            For Each r2 In rng2
                If r1.Value = r2.Value Then
                    icd1_Arr = Split(Sh1.Cells(r1.Row, "B"), ",")
                    icd2_Arr = Split(Sh2.Cells(r2.Row, "B"), ",")
                    
                    For m = 0 To UBound(icd1_Arr)
                        For n = 0 To UBound(DelSpaces)
                            icd1_Arr(m) = Replace(icd1_Arr(m), DelSpaces(n), "")
                        Next n
                    Next m
                    
                    For m = 0 To UBound(icd2_Arr)
                        For n = 0 To UBound(DelSpaces)
                            icd2_Arr(m) = Replace(icd2_Arr(m), DelSpaces(n), "")
                        Next n
                    Next m
                    
                    For x = 0 To UBound(icd1_Arr)
                        For y = 0 To UBound(icd2_Arr)
                            If UCase(icd1_Arr(x)) = UCase(icd2_Arr(y)) Then
                                If Len(Sh1.Cells(r1.Row, "C")) = 0 Then
                                    Sh1.Cells(r1.Row, "C") = icd1_Arr(x)
                                Else
                                    Sh1.Cells(r1.Row, "C") = Sh1.Cells(r1.Row, "C") & ", " & icd1_Arr(x)
                                End If
                            End If
                        Next y
                    Next x
                End If
            Next r2
        Next r1
        
        MsgBox "DONE"
        
    End Sub
    Attached Files Attached Files

  2. #2
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,730
    Location
    I'm not following

    Using just these as an example, can you explain the issue?

    Capture.JPG



    Capture2.JPG
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  3. #3
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Hi Paul,

    thanks for your response.
    I found a special character Chr(63) in a few areas in between the ICD codes in Sh2. My code found to be working after removing this special character.
    Still the code can be better and request your help in case of any fine tuning or betterment can be done.
    Thanks in advance.
    Screenshot 2023-08-07 074855.jpg
    Attached Files Attached Files

  4. #4
    VBAX Mentor
    Joined
    Nov 2022
    Location
    The Great Land
    Posts
    337
    Location
    Have you step debugged?

    I copied the Returned sheet and deleted all but one row for respiratory_upper_acute_engagement.

    I find that the below line shows comparison of values with ? character so it tries to compare: "j06.9" = "j06.9?" which fails. So where does that ? character come from?

    If UCase(icd1_Arr(x)) = UCase(icd2_Arr(y)) Then

    The same happens with gerd_only but it doesn't have any matches in this dataset so doesn't matter now. Might be happening with others but didn't test.

    Checking this line shows ? character has already invaded the data:

    icd2_Arr = Split(Sh2.Cells(r2.Row, "B"), ",")

    I did a manual Find/Replace to remove all spaces from Cohort data but still get ? character. I tried do a replace in VBA with Chr(160) since you show that in your code and still get ? character. I did a loop with Asc() to print ASCII code of each character in Cohort string. The ? character listed as number 63 but doing a Replace with Chr(63) did not work either.

    Why does Cohort data have this unknown character that VBA can't process?



    Side note: endometriosis is misspelled as endomentrosis
    Last edited by June7; 08-06-2023 at 10:19 PM.
    How to attach file: Reading and Posting Messages (vbaexpress.com), click Go Advanced below post edit window. To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  5. #5
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Hi June7,
    Thanks for your response
    The details in Sh2 were copied form PowerPoint slides. That could be the reason for invisible special character (?).
    I manually copied that invisible character from the cell and did a find and replace with blank.

  6. #6
    VBAX Mentor
    Joined
    Nov 2022
    Location
    The Great Land
    Posts
    337
    Location
    How did you get it from the cell?

    Does that solve the issue?

    I built a VBA function to rebuild string with only alpha-numeric, comma, period characters.

    Another side note.
    Both paeds_acute_lower_respiratory and paeds_acute_upper_respiratory have same TestName of Acutelowerrespiratory infectioninlast12months,age3to18 years.
    How to attach file: Reading and Posting Messages (vbaexpress.com), click Go Advanced below post edit window. To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  7. #7
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    I copied 1 character from right to left. And replaced it with blanks. And it solved the issue.
    Screenshot 2023-08-07 103430.jpg

    I have purposefully kept same rules for both paeds_acute_lower_respiratory and paeds_acute_upper_respiratory
    Thanks,

  8. #8
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Try changing your assignment of DelSpaces to:
    DelSpaces = Array(" ", Chr(160), ChrW(8203))

    Chr(63) didn't always remove the extra character on my machine, but this seemed to.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  9. #9
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Have you considered clearing the unwanted characters from the sheet? At the moment, your code leaves the columns B in the sheets untouched.
    Directly after you've set rng1 and rng2 you could search and replace on both columns B:
    For n = 0 To UBound(DelSpaces)
        rng1.Offset(, 1).Replace What:=DelSpaces(n), Replacement:="", LookAt:=xlPart, _
            SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
            ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
        rng2.Offset(, 1).Replace What:=DelSpaces(n), Replacement:="", LookAt:=xlPart, _
            SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
            ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
    Next n
    This means you won't have to look for them later in the code.

    Otherwise, to speed up the code you need to read and write from the sheet as few times as possible.
    At the moment, the code is writing to the same cell multiple times, but worse, it's reading from both sheets thousands of times.
    Ultimately, we could have it so that the read/write operations count is reduced to just 3; once for reading Sh1, once for reading Sh2 and once for writing to column C. Come back if you want this.
    Last edited by p45cal; 08-07-2023 at 08:14 AM.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  10. #10
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Understood
    Thanks p45cal

  11. #11
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Quote Originally Posted by p45cal View Post
    Ultimately, we could have it so that the read/write operations count is reduced to just 3; once for reading Sh1, once for reading Sh2 and once for writing to column C. Come back if you want this.
    This would be faster with 500k rows.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  12. #12
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Yes, I have modified the code in the original file, and it completes in less than 2 minutes which otherwise was taking more than 5 minutes.
    But adding a status bar update is taking more than 10 minutes.

    Option Explicit
    
    
    Sub ICD_Check()
    
    
        Dim Sh1    As Worksheet, Sh2 As Worksheet
        Dim Sh1_LR As Long, Sh2_LR As Long
        Dim rng1   As Range, r1 As Range, rng2 As Range, r2 As Range
        Dim icd1_Arr As Variant, icd2_Arr As Variant
        Dim x      As Long, y As Long, m As Long, n As Long
        Dim DelSpaces As Variant
        Dim t      As Double
    
    
        Application.ScreenUpdating = False
        Application.Calculation = xlManual
        
        t = Now
        DelSpaces = Array(" ", Chr(160), ChrW(8203))  'ChrW(8203) = Chr(63) extra character
        Set Sh1 = Sheets("audit_retail_returned")
        Set Sh2 = Sheets("Cohort v2")
        Sh1_LR = Sh1.UsedRange.Rows.Count
        Sh2_LR = Sh2.UsedRange.Rows.Count
        
        Sh1.Range("X2:X" & Sh1_LR).ClearContents
        
        Set rng1 = Sh1.Range("C2:C" & Sh1_LR)
        Set rng2 = Sh2.Range("B2:B" & Sh2_LR)
        
        For n = 0 To UBound(DelSpaces)
            rng1.Offset(, 20).Replace What:=DelSpaces(n), Replacement:="", LookAt:=xlPart, _
                           SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
                           ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
            rng2.Offset(, 1).Replace What:=DelSpaces(n), Replacement:="", LookAt:=xlPart, _
                           SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
                           ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
        Next n
        
        For Each r1 In rng1
            For Each r2 In rng2
                If r1.Value = r2.Value Then
                    icd1_Arr = Split(Sh1.Cells(r1.Row, "W"), ",")
                    icd2_Arr = Split(Sh2.Cells(r2.Row, "C"), ",")
                    
                    For x = 0 To UBound(icd1_Arr)
                        For y = 0 To UBound(icd2_Arr)
                            If UCase(Left(icd1_Arr(x), 3)) = UCase(Left(icd2_Arr(y), 3)) Then
                                If Len(Sh1.Cells(r1.Row, "X")) = 0 Then
                                    Sh1.Cells(r1.Row, "X") = icd2_Arr(y)
                                Else
                                    Sh1.Cells(r1.Row, "X") = Sh1.Cells(r1.Row, "X") & ", " & icd2_Arr(y)
                                End If
                            End If
                        Next y
                    Next x
                End If
            Next r2
    '        DoEvents
    '        Application.StatusBar = "The code is working : " & Format(r1.Row / Sh1_LR, "0%")
            
        Next r1
    '    Application.StatusBar = False
        Application.ScreenUpdating = True
        Application.Calculation = xlAutomatic
        MsgBox Format(Now - t, "HH:MM:SS")
        
    End Sub

  13. #13
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Try the following. You probably won't need a progress bar.
    Sub ICD3_Check()
    Dim Sh1 As Worksheet, Sh2 As Worksheet
    Dim Sh1_LR As Long, Sh2_LR As Long, i1 As Long, i2 As Long
    Dim rng1Vals, rng2Vals
    Dim icd1_Arr As Variant, icd2_Arr As Variant
    Dim x As Long, y As Long, m As Long, n As Long
    Dim DelSpaces
    Dim rngResults As Range
    
    DelSpaces = Array(" ", Chr(160), ChrW(8203))     'Chr(160) = special character space
    
    Set Sh1 = Sheets("audit_retail_returned")
    Set Sh2 = Sheets("Cohort v2")
    Sh1_LR = Sh1.UsedRange.Rows.Count                'sheet 1 last row
    Sh2_LR = Sh2.UsedRange.Rows.Count                'sheet 2 last row
        
    Set rngResults = Sh1.Range("X2:X" & Sh1_LR)
    ReDim Results(1 To Sh1_LR - 1, 1 To 1)
    rng1Vals = Sh1.Range("A2:B" & Sh1_LR).Value      'read/write 1
    rng2Vals = Sh2.Range("A2:B" & Sh2_LR).Value      'read/write 2
        
    For i1 = 1 To UBound(rng1Vals)
      For i2 = 1 To UBound(rng2Vals)
        If rng1Vals(i1, 1) = rng2Vals(i2, 1) Then
          icd1_Arr = Split(rng1Vals(i1, 2), ",")
          icd2_Arr = Split(rng2Vals(i2, 2), ",")
                    
          For n = 0 To UBound(DelSpaces)
            For m = 0 To UBound(icd1_Arr)
              icd1_Arr(m) = Replace(icd1_Arr(m), DelSpaces(n), "")
            Next m
            For m = 0 To UBound(icd2_Arr)
              icd2_Arr(m) = Replace(icd2_Arr(m), DelSpaces(n), "")
            Next m
          Next n
                    
          For x = 0 To UBound(icd1_Arr)
            For y = 0 To UBound(icd2_Arr)
              If UCase(icd1_Arr(x)) = UCase(icd2_Arr(y)) Then
                If IsEmpty(Results(i1, 1)) Then
                  Results(i1, 1) = icd1_Arr(x)
                Else
                  Results(i1, 1) = Results(i1, 1) & ", " & icd1_Arr(x)
                End If
              End If
            Next y
          Next x
        End If
      Next i2
    Next i1
    rngResults.Value = Results                       'read/write 3
    MsgBox "DONE"
    End Sub
    It leaves the columns B untouched.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  14. #14
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks p45cal

    In my original file campaignname name is in column C and ICD in column W. Hence, I modified the code to
    rng1Vals = Sh1.Range("C2:W" & Sh1_LR).Value
    and
    icd1_Arr = Split(rng1Vals(i1, 21), ",")
    Is there any way to limit the range to just columns C and W?

  15. #15
    VBAX Mentor
    Joined
    Nov 2022
    Location
    The Great Land
    Posts
    337
    Location
    You mean a non-contiguous range - maybe.

    Try: Range("C2:C" & Sh1_LR & ",W2:W" & Sh1_LR)
    How to attach file: Reading and Posting Messages (vbaexpress.com), click Go Advanced below post edit window. To provide db: copy, remove confidential data, run compact & repair, zip w/Windows Compression.

  16. #16
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Quote Originally Posted by anish.ms View Post
    Is there any way to limit the range to just columns C and W?
    Yes, there is, although it's not just a one or two line tweak.
    Is it causing a memory problem or significant time increase if you keep it as you have it?
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  17. #17
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Quote Originally Posted by June7 View Post
    You mean a non-contiguous range - maybe.

    Try: Range("C2:C" & Sh1_LR & ",W2:W" & Sh1_LR)
    Unfortunately June7, when you try to assign the values to a variable, only the first column is assigned.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  18. #18
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    No, it is working fine currently.
    I just asked in conscious of the memory usage and as part of my leaning and for the future reference.
    Thanks p45cal

  19. #19
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    Try this:
    Sub ICD4_Check()
    Dim Sh1 As Worksheet, Sh2 As Worksheet
    Dim Sh1_LR As Long, Sh2_LR As Long, i1 As Long, i2 As Long
    Dim rng1CampaignVals, rng2CampaignVals, rng1ICDVals, rng2ICDVals
    Dim icd1_Arr As Variant, icd2_Arr As Variant
    Dim x As Long, y As Long, m As Long, n As Long
    Dim DelSpaces
    Dim rngResults As Range
    
    DelSpaces = Array(" ", Chr(160), ChrW(8203))     'Chr(160) = special character space
    
    Set Sh1 = Sheets("audit_retail_returned")
    Set Sh2 = Sheets("Cohort v2")
    Sh1_LR = Sh1.UsedRange.Rows.Count                'sheet 1 last row
    Sh2_LR = Sh2.UsedRange.Rows.Count                'sheet 2 last row
        
    Set rngResults = Sh1.Range("X2:X" & Sh1_LR)      'where the results will go
    ReDim Results(1 To Sh1_LR - 1, 1 To 1)
    rng1CampaignVals = Sh1.Range("A2:A" & Sh1_LR).Value 'read/write 1
    rng1ICDVals = Sh1.Range("B2:B" & Sh1_LR).Value      'read/write 2
    rng2CampaignVals = Sh2.Range("A2:A" & Sh2_LR).Value 'read/write 3
    rng2ICDVals = Sh2.Range("B2:B" & Sh2_LR).Value      'read/write 4
        
    For i1 = 1 To UBound(rng1CampaignVals)
      For i2 = 1 To UBound(rng2CampaignVals)
        If rng1CampaignVals(i1, 1) = rng2CampaignVals(i2, 1) Then
          icd1_Arr = Split(rng1ICDVals(i1, 1), ",")
          icd2_Arr = Split(rng2ICDVals(i2, 1), ",")
                    
          For n = 0 To UBound(DelSpaces)
            For m = 0 To UBound(icd1_Arr)
              icd1_Arr(m) = Replace(icd1_Arr(m), DelSpaces(n), "")
            Next m
            For m = 0 To UBound(icd2_Arr)
              icd2_Arr(m) = Replace(icd2_Arr(m), DelSpaces(n), "")
            Next m
          Next n
                    
          For x = 0 To UBound(icd1_Arr)
            For y = 0 To UBound(icd2_Arr)
              If UCase(icd1_Arr(x)) = UCase(icd2_Arr(y)) Then
                If IsEmpty(Results(i1, 1)) Then
                  Results(i1, 1) = icd1_Arr(x)
                Else
                  Results(i1, 1) = Results(i1, 1) & ", " & icd1_Arr(x)
                End If
              End If
            Next y
          Next x
        End If
      Next i2
    Next i1
    rngResults.Value = Results                       'read/write 5
    MsgBox "DONE"
    End Sub
    The lines you'll need to adjust are:
    Set rngResults = Sh1.Range("X2:X" & Sh1_LR)      'where the results will go
    
    rng1CampaignVals = Sh1.Range("A2:A" & Sh1_LR).Value 'read/write 1
    rng1ICDVals = Sh1.Range("B2:B" & Sh1_LR).Value      'read/write 2
    rng2CampaignVals = Sh2.Range("A2:A" & Sh2_LR).Value 'read/write 3
    rng2ICDVals = Sh2.Range("B2:B" & Sh2_LR).Value      'read/write 4
    It should take well under a minute.
    Last edited by p45cal; 08-08-2023 at 12:27 PM.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  20. #20
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,876
    How long is it taking?

    If you think you must show progress then if, directly after the line:
    For i1 = 1 To UBound(rng1CampaignVals)
    you add the line:
    If i1 Mod 10000 = 0 Then Application.StatusBar = i1 & " of " & UBound(rng1CampaignVals)
    then directly before the End Sub line add:
    Application.StatusBar = ""
    you'll get progress updates in the status bar which won't greatly increase processing time.

    Important: Don't use Application.ScreenUpdating = False/True, you risk not being able to see the status bar updating and it won't significantly reduce the time (since there are only 5 read/writes to the sheet).
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •