Consulting

Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 50

Thread: A Data Cleaning Macro - Clean, Trim and CHAR(160)

  1. #1

    A Data Cleaning Macro - Clean, Trim and CHAR(160)

    Hi All,

    I'm trying to hack together a simple VBA macro that is basically analogous to the excel formula below, but which will work on the selected range of cells.
    =TRIM(CLEAN(SUBSTITUTE([selected range],CHAR(160)," ")))

    I found a similar piece of code for a starting point (not mine, can't find the original post to credit author now either - Sorry!), though I do not need nor want the PROPER function.
    So, I've started hacking away at this piece of original starting code:
    Sub CleanTrimProper()
    Dim r As Range
    Set r = Selection
    r.Value = Evaluate("IF(ROW(" & r.Address & "),PROPER(TRIM(CLEAN(" & r.Address & "))))")
    End Sub
    I have to be honest though, I don't understand how this works:
    " & r.Address & "

    Any suggestions?
    My guesses so far have failed miserably, principally because i have no idea what i'm doing
    r.Value = Evaluate("IF(ROW(" & r.Address & "),TRIM(CLEAN(REPLACE(" & r.Address & ",CHR(160)," * "))))")
    r.Value = Evaluate("IF(ROW(" & r.Address & "),REPLACE(TRIM(CLEAN(" & r.Address & ")),CHR(160),""))")
    r.Value = Evaluate("IF(ROW(" & r.Address & "),TRIM(CLEAN(" & r.Address & ")))").Replace(Chr(160), "")
    Cheers

  2. #2
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    Sub CleanTrimProper() 
        msgbox selection.address
        msgbox "IF(ROW(" & selection.Address & "),PROPER(TRIM(CLEAN(" & selection.Address & "))))"
        selection= Evaluate("IF(ROW(" & selection.Address & "),PROPER(TRIM(CLEAN(" & selection.Address & "))))") 
    End Sub
    Now you can see what string is being entered into the method 'Evaluate'
    Compare that string to the Excel formula you would like to be used. Adapt the textstring accordingly.

    You can also try:
    Sub M_snb()
      Selection=Evaluate("Index(trim(clean(" & selection.address & ")),)")
    End Sub

  3. #3
    Righto, i'll dive into it and get back to you. Thanks.

  4. #4
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    This gave me the desired result:

    Sub M_snb()
        Selection = Evaluate("if(" & Selection.Address & "="""","""",clean(trim(" & Selection.Address & ")))")
    End Sub

  5. #5
    That doesn't remove my arch nemesis, ASCII 160.
    Getting just the clean and trim function to work is fine, but it doesn't seem to work when i use the same methods to pass the data to the replace function....

  6. #6
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    Wouldn't this suffice to remove chr(160) :

    Sub M_snb()
      activesheet.cells.replace chr(160),""
    End Sub
    Did you use ?

    Sub M_snb() 
        Selection = Evaluate("if(" & Selection.Address & "="""","""",substitute(clean(trim(" & Selection.Address & ")),char(160),""""))") 
    End Sub

  7. #7
    That does indeed work, thanks very much!

    I still don't fully understand this though:
    " & Selection.Address & "

    Why does it need to be concatenated with leading and trailing spaces?
    Doesn't Selection.Address just return a string with the cell addresses...?

  8. #8
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    Analogy:

    msgbox "this is number " & 5 & " as you can see"
    
    c00="this text"
    msgbox "Now you can see " & c00

  9. #9
    Yep, copy that. I have no idea why that was confusing me last night, the old brain must have been a bit fried!

    Thanks for the help snb!

  10. #10
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,728
    Location
    If you're replacing a NBS, do you want to replace it with a regular old space chr(32)?

    activesheet.cells.replace chr(160),""
    Paul

  11. #11
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,728
    Location
    I get some pretty crappy data pasted in from Word or the web or the mainframe (we still use them)

    This is the 'Range Cleaner' that I use, and so far it seems to work ok for cleaning and trimming

    Also added option to UC text and to keep the length within a limit


    Option Explicit
    
    'Use Clean on text imported from other applications that contains characters that may not print with your operating system.
    'For example, you can use Clean to remove some low-level computer code that is frequently at the beginning and end of data
    'files and cannot be printed.
    'Important  The Clean function was designed to remove the first 32 nonprinting characters in the 7-bit ASCII code
    '(values 0 through 31) from text. In the Unicode character set, there are additional nonprinting characters
    '(values 127, 129, 141, 143, 144, and 157). By itself, the Clean function does not remove these additional
    'nonprinting characters
    
    Sub Range_Clean(RangeIn As Range, Optional MakeUpperCase As Boolean = True, Optional MaxLen As Long = 255)
        Const csSpace As String = " "
        Dim rArea As Range, rText As Range
        Dim bEvents As Boolean, bScreen As Boolean
        Dim s As String
        
        bEvents = Application.EnableEvents
        bScreen = Application.ScreenUpdating
        
        On Error GoTo NiceExit
        
        Set rText = RangeIn.SpecialCells(xlCellTypeConstants, xlTextValues)
        
        'replace CR, NL, and tab with space, and 127, 129, 141, 143, 144, and 157 and 160
        With rText
            Call .Replace(Chr(9), csSpace, xlPart)
            Call .Replace(vbCrLf, csSpace, xlPart)
            Call .Replace(Chr(13), csSpace, xlPart)
            
            Call .Replace(Chr(127), csSpace, xlPart)
            Call .Replace(Chr(129), csSpace, xlPart)
            Call .Replace(Chr(141), csSpace, xlPart)
            Call .Replace(Chr(143), csSpace, xlPart)
            Call .Replace(Chr(144), csSpace, xlPart)
            Call .Replace(Chr(157), csSpace, xlPart)
            Call .Replace(Chr(160), csSpace, xlPart)
        End With
        
        For Each rArea In rText.Areas
            With rArea
                .Value = Application.Evaluate("=TRIM(" & .Address & ")")
                .Value = Application.Evaluate("=CLEAN(" & .Address & ")")
                .Value = Application.Evaluate("=LEFT(" & .Address & "," & MaxLen & ")")
                .Value = Application.Evaluate("=TRIM(" & .Address & ")")    'in case LEFT leaves space(s) at end
                If MakeUpperCase Then .Value = Application.Evaluate("=UPPER(" & .Address & ")")
            End With
        Next
        
    NiceExit:
        Application.EnableEvents = bEvents
        Application.ScreenUpdating = bScreen
    End Sub
    
    
    Sub test()
        Call Range_Clean(Cells(1, 1).CurrentRegion, False, 6)
    End Sub
    Paul

  12. #12
    Excellent, thanks Paul; I'll have a look at that as well.

    Here's a similar one I found: (source: http://www.excelfox.com/forum/f13/tr...-155/#post1092)

    Function CleanTrim(ByVal S As String, Optional ConvertNonBreakingSpace As Boolean = True) As String
      Dim X As Long, CodesToClean As Variant
      CodesToClean = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, _
                           21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 129, 141, 143, 144, 157)
      If ConvertNonBreakingSpace Then S = Replace(S, Chr(160), " ")
      For X = LBound(CodesToClean) To UBound(CodesToClean)
        If InStr(S, Chr(CodesToClean(X))) Then S = Replace(S, Chr(CodesToClean(X)), "")
      Next
      CleanTrim = WorksheetFunction.Trim(S)
    End Function
    I modified that and ended up with the following function and sub:

    Function CleanTrimExcel(ByVal S As String, Optional ConvertNonBreakingSpace As Boolean = True) As String
        
        Dim X As Long
        Dim CodesToReplace() As Variant
        
        If ConvertNonBreakingSpace Then
            ReDim CodesToReplace(1 To 7)
            CodesToReplace = Array(127, 129, 141, 143, 144, 157, 160)
        Else
            ReDim CodesToReplace(1 To 6)
            CodesToReplace = Array(127, 129, 141, 143, 144, 157)
        End If
               
        For X = LBound(CodesToReplace) To UBound(CodesToReplace)
            If InStr(S, Chr(CodesToReplace(X))) Then S = Replace(S, Chr(CodesToReplace(X)), Chr(0))
        Next
               
        CleanTrimExcel = WorksheetFunction.Trim(WorksheetFunction.Clean(S))
        
    End Function
    Sub CallCleanTrimExcel()
    
    
        Dim arr() As Variant
        Dim m As Double
        Dim n As Double
        
        arr = Selection.Value
        
        For m = LBound(arr, 1) To UBound(arr, 1)
            For n = LBound(arr, 2) To UBound(arr, 2)
                arr(m, n) = CleanTrimExcel(arr(m, n))
            Next n
        Next m
        
        Selection = arr()
    
    
    End Sub

  13. #13
    Actually it looks like you have addressed an issue that just occurred to me, namely keeping formula cells as formulas, whilst cleaning all the cells that hold constants around them.

    Can this be done using the .SpecialCells property on the selection?
    I'm thinking the answer lies in getting to know these two:
    http://msdn.microsoft.com/en-us/libr.../ff196157.aspx
    http://msdn.microsoft.com/en-us/libr.../ff836534.aspx

    The intended outcome would be that so you can select a range that is a mixture of constants and formulas, and then process that range in a way that outputs the cleaned constants and leaves the formulas untouched.
    Last edited by Simple_One; 08-21-2013 at 08:54 PM.

  14. #14
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    Why don't you use Excel's replace instead of VBA's replace ?


    Sub M_snb()
     sn = Array(127, 129, 141, 143, 144, 157, 160)
     
     For j = 0 To UBound(sn)
       Cells(j + 1, 4) = String(5, "a") & Chr(sn(j)) & String(8, "BB")
     Next
     
     For j = 0 To UBound(sn)
        Cells.Replace Chr(sn(j)), " "
     Next
     Cells.Replace "  ", " "
    End Sub

  15. #15
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,728
    Location
    Quote Originally Posted by Simple_One View Post
    Actually it looks like you have addressed an issue that just occurred to me, namely keeping formula cells as formulas, whilst cleaning all the cells that hold constants around them.
    .SpecialCells does that, and I limit it to Constants and TextValues.

    This will normally select a range with multiple blocks, so I loop through the .Areas and do it one block at a time for the WS functions

    For the VBA .Replace, it seems to be smart enough to just do the text cells, but if there's a LOT of data, it MIGHT be faster to also do that inside an .Areas loop

    Paul

  16. #16
    VBAX Regular
    Joined
    Feb 2009
    Posts
    9
    Location
    One more:
    Sub TrimALL()
    'David McRitchie 2000-07-03 mod 2000-08-16 join.htm
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    Dim cell As Range
    'Also Treat CHR 0160, as a space (CHR 032)
    Selection.Replace What:=Chr(160), Replacement:=Chr(32), _
    LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
    'Trim in Excel removes extra internal spaces, VBA does not
    On Error Resume Next 'in case no text cells in selection
    For Each cell In Intersect(Selection, _
    Selection.SpecialCells(xlConstants, xlTextValues))
    cell.Value = Application.trim(cell.Value)
    Next cell
    On Error GoTo 0
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
    End Sub

  17. #17
    Thanks AMontes, using the Intersect function is far more elegant than anything I had come up with. Cheers!

  18. #18
    Well, here is what I ended up with: 1 operational sub, and two subs setup to call the operational one.

    One of the callers subs sets it to operate on the selected cells, the other iterates through every worksheet and passes the used range to the function for cleaning.

    In the operational sub (TrimAll):
    Cleaning operations are in the form of an enumerated data type and can be employed bitwise to allow for the selection of different cleaning options. This is done using the addition operators when sending the arguments to the function (as seen in the subs).
    The cleaning argument options allow for use of Excels Trim, Clean and Proper functions. The size operation just uses VBA's Left function in conjunction with an optional Length argument.
    There is an argument that means that only the 'unusual characters' (CodesToReplace Array) are called, and all further cleaning is skipped.
    The replacement character used can be changed, but defaults to a space (Char(32)).
    There are two cases for the function. One outputs more status bar progress information than the other, but probably adds some overhead in doing so. This is the default. Whether or not this really makes a difference is beyond my level of knowledge, but I figured i would add the option to avoid any overhead if people wanted it.

    Notes:
    I'm not a developer, I'm just hacking things together for my purposes. As such, whilst i have tested this code it's probably not very robust.
    If someone who really knows what they are doing would like to polish it, test it rigorously, optimize it or add error checking etc then please do as I have just modded the code and concepts provided by people in this thread.
    If you do, I would ask that:
    1. You post a link to your improved code in this thread though so I can find it!
    2. You add a comment in your code with a link to this thread so that people can see the contributors of the original code and ideas.

    Option Explicit
    
    ' An enumeration to allow for bitwise option selection - uses base 2, aka: binary
    Enum CleanType
        Size = 2 ^ 0
        TextToNum = 2 ^ 1
        Trim = 2 ^ 2
        Clean = 2 ^ 3
        Proper = 2 ^ 4
    End Enum
    
    
    
    
    Public Sub TrimAll(RangeIn As Range, Optional CleaningMode As CleanType = 12, Optional ReplaceOnly As Boolean = False, _
                    Optional Length As Integer = 255, Optional ReplacementCode As Integer = 32, Optional bCompleteStatus As Boolean = True)
         
        Dim Cell As Range, IntersectRng As Range
        Dim Index As Integer
        Dim CurrentProgressValue As Double, NumCells As Double, NumCellRanges As Double
        Dim CodesToReplace() As Variant
        
    
    
        Select Case bCompleteStatus
        
            ' A version that requires more calculation, but provides more feedback to the user through the status bar.
            Case True
                
                ' Reinitialise the progress variable for status bar message
                CurrentProgressValue = 0
                ' Initialise character codes to be replaced
                CodesToReplace() = Array(127, 129, 141, 143, 144, 157, 160)
                
                For Index = LBound(CodesToReplace) To UBound(CodesToReplace)
                           
                    ' Display progress information in status bar
                    Application.StatusBar = "Currently on worksheet: """ & ActiveSheet.Name & """  -  Replacement in Progress: """ & CurrentProgressValue & " of " & UBound(CodesToReplace) & ": " & Format(CurrentProgressValue / UBound(CodesToReplace), "Percent") & """  -  Macro Is Still Running!"
                           
                    ' Convert all extra codes to the ReplacementCode character for trimming by other functions or other means in Excel
                    RangeIn.Replace What:=Chr(CodesToReplace(Index)), Replacement:=Chr(ReplacementCode), LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
                    On Error Resume Next 'in case no text cells in selection
                    
                    ' Iterate the progress counter
                    CurrentProgressValue = CurrentProgressValue + 1
                    
                Next Index
                
                ' Display progress information in status bar
                NumCells = RangeIn.Cells.count
                If NumCells > 100000 Then
                    Application.StatusBar = "Currently on worksheet: """ & ActiveSheet.Name & """  -  Calculating which cells are text values out of the " & Format(NumCells, "#,##0") & _
                                                " that were passed for further processing  -  This might take some time, but THE MACRO IS STILL RUNNING!"
                Else
                    Application.StatusBar = "Currently on worksheet: """ & ActiveSheet.Name & """  -  Calculating which cells are text values for further processing  -  Macro Is Still Running!"
                End If
                
                ' Reinitialise the progress variable for status bar message
                CurrentProgressValue = 0
                Set IntersectRng = Intersect(RangeIn, RangeIn.SpecialCells(xlConstants, xlTextValues))
                NumCellRanges = IntersectRng.Cells.count
                
                ' Allow system to do events to aid stability if the passed cell range is larger than some amount
                If NumCells > 100000 Then
                    DoEvents
                End If
                
                If Not ReplaceOnly Then
                    For Each Cell In IntersectRng
                               
                        ' Stop every X number of cells and hand over control to system to execute commands - to aid in stability and progress count
                        If CurrentProgressValue Mod 50000 = 0 Then
                            ' Allow system to do events, then get VBA to wait for some number of seconds to aid stability
                            DoEvents
                            ' Uncomment the line below to add wait time if required, the current config adds one second of wait time.
                            'Application.Wait (Now() + CDate("00:00:01"))
                        End If
                        
                        ' Display progress information in status bar every so often as determined by calculation.
                        If CurrentProgressValue Mod 5000 = 0 Then
                            Application.StatusBar = "Currently on worksheet: """ & ActiveSheet.Name & """  -  Cleaning in Progress: """ & CurrentProgressValue & " of " & NumCellRanges & ": " & Format(CurrentProgressValue / NumCellRanges, "Percent") & """  -  Macro Is Still Running!"
                        End If
                        
                        If CleaningMode And Size Then
                            Cell.Value = Left(Cell.Value, Length)
                        End If
                        If CleaningMode And TextToNum Then
                            Cell.Value = Cell.Value * 1
                        End If
                        If CleaningMode And Trim Then
                            Cell.Value = Application.WorksheetFunction.Trim(Cell.Value)
                        End If
                        If CleaningMode And Clean Then
                            Cell.Value = Application.WorksheetFunction.Clean(Cell.Value)
                        End If
                        If CleaningMode And Proper Then
                            Cell.Value = Application.WorksheetFunction.Proper(Cell.Value)
                        End If
                        
                        ' Iterate the progress counter
                        CurrentProgressValue = CurrentProgressValue + 1
                    
                    Next Cell
                End If
                
                
            ' A version that requires minimal calculation to track progress, but provides less feedback to the user from the status bar.
            Case False
                
                ' Initialise character codes to be replaced
                CodesToReplace = Array(127, 129, 141, 143, 144, 157, 160)
                
                ' Display general information in status bar
                Application.StatusBar = "Currently on worksheet: """ & ActiveSheet.Name & """  -  Replacement in progress  -  Macro Is Still Running!"
                
                For Index = LBound(CodesToReplace) To UBound(CodesToReplace)
                    ' Convert all extra codes to the ReplacementCode character for trimming by other functions or other means in Excel
                    RangeIn.Replace What:=Chr(CodesToReplace(Index)), Replacement:=Chr(ReplacementCode), LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
                    On Error Resume Next 'in case no text cells in selection
                Next Index
                
                ' Allow system to do events to aid stability
                DoEvents
                
                ' Display general information in status bar
                Application.StatusBar = "Currently on worksheet: """ & ActiveSheet.Name & """  -  Cleaning in progress  -  Macro Is Still Running!"
                
                Set IntersectRng = Intersect(RangeIn, RangeIn.SpecialCells(xlConstants, xlTextValues))
                
                If Not ReplaceOnly Then
                    For Each Cell In IntersectRng
                        If CleaningMode And Size Then
                            Cell.Value = Left(Cell.Value, Length)
                        End If
                        If CleaningMode And TextToNum Then
                            Cell.Value = Cell.Value * 1
                        End If
                        If CleaningMode And Trim Then
                            Cell.Value = Application.WorksheetFunction.Trim(Cell.Value)
                        End If
                        If CleaningMode And Clean Then
                            Cell.Value = Application.WorksheetFunction.Clean(Cell.Value)
                        End If
                        If CleaningMode And Proper Then
                            Cell.Value = Application.WorksheetFunction.Proper(Cell.Value)
                        End If
                    Next Cell
                End If
        
        End Select
        
        
    End Sub
    
    
    
    
    Public Sub Call_TrimAll_Selection()
    'uses bitwise (base 2) enumerated CleanType data type, so options can be added together
    
    
        Dim bStatusBar As Boolean
        
        ' store state of status bar, and then show it so progress of function can be seen
        bStatusBar = Application.DisplayStatusBar
        Application.DisplayStatusBar = True
         
        ' prep for long macro to run
        Application.Cursor = xlWait
        Application.Calculation = xlCalculationManual
        Application.ScreenUpdating = False
        
        Call TrimAll(Selection, Clean + Trim)
        
        'Clear the status bar and restore its original state.
        Application.StatusBar = False
        Application.DisplayStatusBar = bStatusBar
        
        ' Restore after long macro
        Application.ScreenUpdating = True
        Application.Cursor = xlDefault
        Application.Calculation = xlCalculationAutomatic
        Application.CalculateFullRebuild
        
    End Sub
    
    
    
    
    Public Sub Call_TrimAll_Workbook()
    'uses bitwise (base 2) enumerated CleanType data type, so options can be added together
         
        Dim Cell As Range, OriginalCell As Range, WSCell As Range
        Dim CodesToReplace() As Integer, Index As Integer, CurrentProgressValue As Integer
        Dim WS As Worksheet, OriginalWS As Worksheet
        Dim bStatusBar As Boolean
        
        ' store state of status bar, and then show it so progress of function can be seen
        bStatusBar = Application.DisplayStatusBar
        Application.DisplayStatusBar = True
         
        ' prep for long macro to run
        Application.Cursor = xlWait
        Application.Calculation = xlCalculationManual
        Application.ScreenUpdating = False
        
        ' Store the original position prior to running
        Set OriginalCell = ActiveCell
        Set OriginalWS = ActiveSheet
        
        For Each WS In Worksheets
            
            WS.Activate
            Set WSCell = ActiveCell
            Range("A1").Select
            Range(Selection, Selection.SpecialCells(xlLastCell)).Select
            
            Call TrimAll(Selection, Clean + Trim)
            
            WSCell.Select
            
            ' Allow system to do events, then get VBA to wait for some number of seconds to aid stability
            DoEvents
            Application.Wait (Now() + CDate("00:00:01"))
            
        Next WS
        
        OriginalWS.Activate
        OriginalCell.Select
            
        'Clear the status bar and restore its original state.
        Application.StatusBar = False
        Application.DisplayStatusBar = bStatusBar
        
        ' Restore after long macro
        Application.ScreenUpdating = True
        Application.Cursor = xlDefault
        Application.Calculation = xlCalculationAutomatic
        Application.CalculateFullRebuild
        
    End Sub
    Last edited by Simple_One; 08-27-2013 at 08:06 PM. Reason: Corrected code.

  19. #19
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,728
    Location
    small comment -- Cell below is typed as a variant, although you would think it should be a Range like IntersectRng

        Dim Cell, IntersectRng As Range

    Have to be wordy

        Dim Cell As Range, IntersectRng As Range
        Dim CodesToReplace() As Long, Index As Long
        Dim CurrentProgressValue As Double, NumCells As Long, NumCellRanges As Double

    You made this a Function but you don't actually return anything to the function, but update RangeIn

    [
    Function TrimAll(RangeIn As Range, Optional CleaningMode As CleanType = 12, Optional ReplaceOnly As Boolean = False, _
        Optional Length As Integer = 255, Optional ReplacementCode As Integer = 32, Optional bCompleteStatus As Boolean = True)

    My preference would be to make it a Sub




    Ppaul

  20. #20
    Ah, thanks Paul, I'll change the dims now.
    I've revisited this whole data cleaning concept a few times in this process, at one stage I had a flag that would set the function to return an array of results so that you could use it within Excel, I just forgot to change it back. I'll edit the above code to a sub. I can always revisit the idea of altering it to work as a function.

    Righto, I think I've corrected it all, also corrected the progress calc (CurrentProgressValue) to work without option base 1, which i had been using.
    Last edited by Simple_One; 08-27-2013 at 08:08 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •