PDA

View Full Version : [SOLVED] Need help removing unwanted commas that sometimes appear in rows in csv files



business1000
01-11-2015, 10:36 PM
I receive hundreds of csv files which I need to merge every week but some of the files have rows with an extra comma in column 15 which causes the rows to extend into a 19th column.
I need a solution that will search all selected csv files(one file at a time) in a folder and for any files which it sees rows which have different numbers of data fields (columns) of both 18 and 19 columns;
the code must then (one row at a time starting with row 2) fix every row that has 19 columns by removing the 15th comma in that row.
So that that all the rows have the correct number of data fields (columns), which is 18.
Then it will save the file back to the same or a different directory which I can select.

Thanks for any assistance that may be provided.

Bob Phillips
01-12-2015, 01:44 AM
Just record a macro that does a find and replace of two commas by one comma.

business1000
01-12-2015, 06:18 AM
Thanks. but the logic needs to be able to look at all selected files in a folder first to determine if there are files that contain both 18 and 19 columns, then
the logic needs to address any rows that have 19 columns and then remove the comma in column 15 of that row.
I do not see how a recorded macro can do that.
I heard from someone outside this site that a csv text stream approach might be a very effective, simple and efficient strategy, but I do know how to start it off.

business1000
01-12-2015, 08:41 AM
I should clarify, this issue is not that there are not 2 commas that are together.
The issue is: on some rows the Logon ID (15th column) is entered in a 'Lastname, Firstname' format. When I open the csv file, I can see that the 'user entered comma' between 'Lastname' and 'Firstname' pushes the 'Firstname' and remaining data fields in the row 1 column to the right. This is why I have to find and remove that extra comma when is appears - and it appears randomly in about 1/3 of all my files I am required to work with.

SamT
01-12-2015, 09:16 AM
The needed logic here is to combine the Column removal from 19 column files with Need to standardize 100s of excel files that have (http://www.vbaexpress.com/forum/showthread.php?51513-Need-to-standardize-100s-of-excel-files-that-have-columns-in-the-wrong-order)

Are there any other issues with these files? Since they have to be open whenever they are being worked on, it makes sense to do all the corrections at once. These two issues will take less than a dozen lines of code, so far. Plus another dozen to iterate thru all files in a folder.

I can combine your threads into one, but it is much easier if one thread is very short, so please reply in this one.

business1000
01-12-2015, 09:49 AM
No other issues.
Just a clarification:
This issue is not that there are not 2 commas that are together in a row.
The issue is: on some rows the Logon ID (15th column) is entered in a 'Lastname, Firstname' format. When I open the csv file, I can see that the 'user entered comma' between 'Lastname' and 'Firstname' pushes the 'Firstname' and remaining data fields in the row 1 column to the right. This is why I have to find and remove that extra comma when is appears - and it appears randomly in about 1/3 of all my files I am required to work with.

Thanks for any assistance that may be provided.

Zack Barresse
01-12-2015, 10:39 AM
Heya,

Try this on a few sample files set aside in a test directory. Two main routines, either select an entire folder, or select specific files (multiselect). Both call the same routine...


Option Explicit

Public Enum FilterType
CSV = 0
XL = 1
End Enum

Private Const FilterCSV As String = "CSV Files (*.csv), *.csv"
Private Const FilterXL As String = "Excel Files (*.xl*), *.xl*"

Dim TargetCount As Long
Dim TargetCountA As Long


Sub CorrectColumnInCSV_Folder()

Dim SelectFolder As FileDialog
Dim FSO As Object
Dim TargetFolder As Object
Dim TargetFile As Object

Set SelectFolder = Application.FileDialog(msoFileDialogFolderPicker)

TargetCount = 0
TargetCountA = 0
SelectFolder.AllowMultiSelect = False
If SelectFolder.Show Then
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TargetFolder = FSO.GetFolder(SelectFolder.SelectedItems.Item(1))
For Each TargetFile In TargetFolder.Files
If UCase(Right(TargetFile.Name, 4)) = ".CSV" Then
TargetCountA = TargetCountA + 1
If ISFILEOPEN(TargetFile.Name) = False Then
TargetCount = CorrectAdditionalColumnInCSV(Target:=TargetFile.Path, _
ColumnCount:=18, _
MergeColStart:=15 _
) + TargetCount
End If
End If
Next TargetFile
End If

MsgBox TargetCount & " of " & TargetCountA & " files have been processed.", vbInformation, "Complete!"

End Sub

Sub CorrectColumnInCSV_File()

Dim TargetFile As Variant
Dim TargetName As String
Dim LoopStep As Long

TargetFile = Application.GetOpenFilename(FilterReturn(CSV), MultiSelect:=True)
If Not IsArray(TargetFile) Then
Exit Sub
End If

TargetCount = 0
TargetCountA = UBound(TargetFile) - LBound(TargetFile) + 1
For LoopStep = LBound(TargetFile) To UBound(TargetFile)
TargetName = Right(TargetFile(LoopStep), Len(TargetFile(LoopStep)) - InStrRev(TargetFile(LoopStep), "\"))
If ISFILEOPEN(TargetName) = False Then
TargetCount = CorrectAdditionalColumnInCSV(Target:=TargetFile(LoopStep), _
ColumnCount:=18, _
MergeColStart:=15 _
) + TargetCount
End If
Next LoopStep

MsgBox TargetCount & " of " & TargetCountA & " files have been processed.", vbInformation, "Complete!"

End Sub


Private Function CorrectAdditionalColumnInCSV( _
ByVal Target As Variant, _
ByVal ColumnCount As Long, _
ByVal MergeColStart As Long, _
Optional ByVal Delimiter As String = ",", _
Optional ByVal ReplaceDelimiter As String = " ") As Long

Dim TempFile As String
Dim TempName As String
Dim TempPath As String
Dim TempExt As String
Dim FileNum1 As Long
Dim FileNum2 As Long
Dim LineText As String
Dim LineOutput As String
Dim LineItems() As String

TempPath = Left(Target, InStrRev(Target, "\"))
TempName = Right(Target, Len(Target) - Len(TempPath))
TempExt = Right(TempName, Len(TempName) - InStrRev(TempName, "."))
TempName = Left(TempName, Len(TempName) - Len(TempExt) - 1) & "(temp write)." & TempExt
TempFile = TempPath & TempName

FileNum1 = FreeFile()
Open Target For Input Access Read As #FileNum1
FileNum2 = FreeFile()
Open TempFile For Output Access Write As #FileNum2

Do While Not EOF(FileNum1)

LineText = ""
Line Input #FileNum1, LineText
LineItems = Split(LineText, Delimiter)

If UBound(LineItems) - LBound(LineItems) + 1 > ColumnCount Then
LineOutput = WorksheetFunction.Substitute(LineText, Delimiter, ReplaceDelimiter, MergeColStart)
Else
LineOutput = LineText
End If

Write #FileNum2, LineOutput

Loop

CorrectAdditionalColumnInCSV = 1

ExitWithoutError:
Close #FileNum1
Close #FileNum2

If Dir(TempFile, vbNormal) <> "" Then
If Dir(Target, vbNormal) <> "" Then FileCopy TempFile, Target
Kill TempFile
End If

Exit Function

ExitWithError:
CorrectAdditionalColumnInCSV = 0
Resume ExitWithoutError

End Function


Function ISFILEOPEN(FileName As String) As Boolean
Dim iFilenum As Long
Dim iErr As Long
On Error Resume Next
iFilenum = FreeFile()
Open FileName For Input Lock Read As #iFilenum
Close iFilenum
iErr = Err
Select Case iErr
Case 0: ISFILEOPEN = False
Case 70: ISFILEOPEN = True
Case Else: Error iErr
End Select
On Error GoTo 0
End Function


Private Function FilterReturn(ByVal Value As FilterType) As String
Select Case Value
Case FilterType.CSV: FilterReturn = FilterCSV
Case FilterType.XL: FilterReturn = FilterXL
End Select
End Function

HTH

SamT
01-12-2015, 10:55 AM
That beats my 6 lines. :D

RC = Rows.Count
Do
Rw = Cells(1, 19).End(xlDown).Row
Cells(Rw, 15) = Cells(Rw,15) & " " & Cells(Rw, 16) 'Edit for FirstName LastName
Cells(Rw,16.Delete Shift:=xlLeft
'Always do bottom Row, It might need it and a lot of code to skip otherwise
Loop While Rw < RC

business1000
01-12-2015, 11:13 AM
Your code removes the extra comma from the 15th column when it finds it but when I open the file in Excel after processing, I see that all the data column elements now show up in Col A.
Is there a way to make it so the data shows up in separate columns as depicted by the commas.
Thanks

Zack Barresse
01-12-2015, 11:48 AM
Text to columns, delimited by comma. ?

SamT
01-12-2015, 12:19 PM
@ Zack
The ommision is in
If UBound(LineItems) - LBound(LineItems) + 1 > ColumnCount Then
LineOutput = WorksheetFunction.Substitute(LineText, Delimiter, ReplaceDelimiter, MergeColStart)
Else


@ Business1K
Substitute this code in toto for the three lines above. Zack can improve it for you later.

Dim NDX as Long, i As Long
NDX = 0
i = 0
If UBound(LineItems) - LBound(LineItems) + 1 > ColumnCount Then
NDX = LBound(lineItems) + 15
LineItems(NDX) = LineItems(NDX) & LineItems(NDX + 1)
For i = 1 to 3
LineItems(NDX + i) = LineItems(NDX + 1 + i)
Next i
LineItems(NDX + 4) = ""
For i = LBound(LineItems) To UBound(LineItems)
LineOutput = LineOutPut & LineItems(i) & ","
Next i
LineOutPut = Left(LineOutPut, Len(LineOutPut - 2))
Else

Zack Barresse
01-12-2015, 12:48 PM
Pretty sure the logic is the same, Sam. ? I'm replacing the nth delimiter with a specified replacement. You're doing the same thing but with two loops and additional variables.

business1000
01-12-2015, 01:20 PM
When I run the code I get a compile error
Variable required - can't assign to this expression

The debugger highlights on the "-2" in this second to last line of code you just modified: LineOutPut = Left(LineOutPut, Len(LineOutPut - 2))

The code I have so far is:


Option Explicit
Public Enum FilterType
CSV = 0
XL = 1
End Enum
Private Const FilterCSV As String = "CSV Files (*.csv), *.csv"
Private Const FilterXL As String = "Excel Files (*.xl*), *.xl*"
Dim TargetCount As Long
Dim TargetCountA As Long

Sub CorrectColumnInCSV_Folder()
Dim SelectFolder As FileDialog
Dim FSO As Object
Dim TargetFolder As Object
Dim TargetFile As Object
Set SelectFolder = Application.FileDialog(msoFileDialogFolderPicker)
TargetCount = 0
TargetCountA = 0
SelectFolder.AllowMultiSelect = False
If SelectFolder.Show Then
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TargetFolder = FSO.GetFolder(SelectFolder.SelectedItems.Item(1))
For Each TargetFile In TargetFolder.Files
If UCase(Right(TargetFile.Name, 4)) = ".CSV" Then
TargetCountA = TargetCountA + 1
If ISFILEOPEN(TargetFile.Name) = False Then
TargetCount = CorrectAdditionalColumnInCSV(Target:=TargetFile.Path, _
ColumnCount:=18, _
MergeColStart:=15 _
) + TargetCount
End If
End If
Next TargetFile
End If
MsgBox TargetCount & " of " & TargetCountA & " files have been processed.", vbInformation, "Complete!"
End Sub
Sub CorrectColumnInCSV_File()
Dim TargetFile As Variant
Dim TargetName As String
Dim LoopStep As Long
TargetFile = Application.GetOpenFilename(FilterReturn(CSV), MultiSelect:=True)
If Not IsArray(TargetFile) Then
Exit Sub
End If
TargetCount = 0
TargetCountA = UBound(TargetFile) - LBound(TargetFile) + 1
For LoopStep = LBound(TargetFile) To UBound(TargetFile)
TargetName = Right(TargetFile(LoopStep), Len(TargetFile(LoopStep)) - InStrRev(TargetFile(LoopStep), "\"))
If ISFILEOPEN(TargetName) = False Then
TargetCount = CorrectAdditionalColumnInCSV(Target:=TargetFile(LoopStep), _
ColumnCount:=18, _
MergeColStart:=15 _
) + TargetCount
End If
Next LoopStep
MsgBox TargetCount & " of " & TargetCountA & " files have been processed.", vbInformation, "Complete!"
End Sub

Private Function CorrectAdditionalColumnInCSV( _
ByVal Target As Variant, _
ByVal ColumnCount As Long, _
ByVal MergeColStart As Long, _
Optional ByVal Delimiter As String = ",", _
Optional ByVal ReplaceDelimiter As String = " ") As Long
Dim TempFile As String
Dim TempName As String
Dim TempPath As String
Dim TempExt As String
Dim FileNum1 As Long
Dim FileNum2 As Long
Dim LineText As String
Dim LineOutput As String
Dim LineItems() As String
TempPath = Left(Target, InStrRev(Target, "\"))
TempName = Right(Target, Len(Target) - Len(TempPath))
TempExt = Right(TempName, Len(TempName) - InStrRev(TempName, "."))
TempName = Left(TempName, Len(TempName) - Len(TempExt) - 1) & "(temp write)." & TempExt
TempFile = TempPath & TempName
FileNum1 = FreeFile()
Open Target For Input Access Read As #FileNum1
FileNum2 = FreeFile()
Open TempFile For Output Access Write As #FileNum2
Do While Not EOF(FileNum1)
LineText = ""
Line Input #FileNum1, LineText
LineItems = Split(LineText, Delimiter)
Dim NDX As Long, i As Long

NDX = 0
i = 0
If UBound(LineItems) - LBound(LineItems) + 1 > ColumnCount Then
NDX = LBound(LineItems) + 15
LineItems(NDX) = LineItems(NDX) & LineItems(NDX + 1)
For i = 1 To 3
LineItems(NDX + i) = LineItems(NDX + 1 + i)
Next i
LineItems(NDX + 4) = ""
For i = LBound(LineItems) To UBound(LineItems)
LineOutput = LineOutput & LineItems(i) & ","
Next i
LineOutput = Left(LineOutput, Len(LineOutput - 2))
Else
LineOutput = LineText
End If
Write #FileNum2, LineOutput
Loop
CorrectAdditionalColumnInCSV = 1
ExitWithoutError:
Close #FileNum1
Close #FileNum2
If Dir(TempFile, vbNormal) <> "" Then
If Dir(Target, vbNormal) <> "" Then FileCopy TempFile, Target
Kill TempFile
End If
Exit Function
ExitWithError:
CorrectAdditionalColumnInCSV = 0
Resume ExitWithoutError
End Function

Function ISFILEOPEN(FileName As String) As Boolean
Dim iFilenum As Long
Dim iErr As Long
On Error Resume Next
iFilenum = FreeFile()
Open FileName For Input Lock Read As #iFilenum
Close iFilenum
iErr = Err
Select Case iErr
Case 0: ISFILEOPEN = False
Case 70: ISFILEOPEN = True
Case Else: Error iErr
End Select
On Error GoTo 0
End Function

Private Function FilterReturn(ByVal Value As FilterType) As String
Select Case Value
Case FilterType.CSV: FilterReturn = FilterCSV
Case FilterType.XL: FilterReturn = FilterXL
End Select
End Function

Zack Barresse
01-12-2015, 01:33 PM
The code does the same thing as mine, although it needs a bit of an adjustment...


If UBound(LineItems) - LBound(LineItems) + 1 > ColumnCount Then
NDX = LBound(LineItems) + MergeColStart - 1
LineItems(NDX) = LineItems(NDX) & LineItems(NDX + 1)
For i = 1 To 3
LineItems(NDX + i) = LineItems(NDX + 1 + i)
Next i
LineItems(NDX + ColumnCount - MergeColStart + 1) = ""
LineOutput = ""
For i = LBound(LineItems) To UBound(LineItems)
LineOutput = LineOutput & LineItems(i) & ","
Next i
LineOutput = Left(LineOutput, Len(LineOutput) - 2)
Else

Variables changed for hard-coded numbers, Len() function fixed, and string variable reset before re-writing.

But again, it's redundant and less efficient than the original code I posted, which works for me.

business1000
01-12-2015, 02:02 PM
I am still getting the same problem - the data is all staying in Col A.
Attached 2 sample data files to test against.

SamT
01-12-2015, 03:23 PM
@ Business 1K
Listen to Zack on this.

I missed something obvious in his code.

I thought that this
the data is all staying in Col A.pointed straight at the problem.

Zack Barresse
01-12-2015, 04:09 PM
In your sample file, I'm assuming the first row is something you added and isn't present in your actual files. In the sample file I used, I removed that row prior to running this.

Minor changes in the code below.

* Trims any trailing comma
* Changed from a "Write" command to a "Print" command, as this was causing the text line to be written into the text file with surrounding quotes, causing it to look as if it was one giant text string


Option Explicit

Public Enum FilterType
CSV = 0
XL = 1
End Enum

Private Const FilterCSV As String = "CSV Files (*.csv), *.csv"
Private Const FilterXL As String = "Excel Files (*.xl*), *.xl*"

Dim TargetCount As Long
Dim TargetCountA As Long


Sub CorrectColumnInCSV_Folder()

Dim SelectFolder As FileDialog
Dim FSO As Object
Dim TargetFolder As Object
Dim TargetFile As Object

Set SelectFolder = Application.FileDialog(msoFileDialogFolderPicker)

TargetCount = 0
TargetCountA = 0
SelectFolder.AllowMultiSelect = False
If SelectFolder.Show Then
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TargetFolder = FSO.GetFolder(SelectFolder.SelectedItems.Item(1))
For Each TargetFile In TargetFolder.Files
If UCase(Right(TargetFile.Name, 4)) = ".CSV" Then
TargetCountA = TargetCountA + 1
If ISFILEOPEN(TargetFile.Name) = False Then
TargetCount = CorrectAdditionalColumnInCSV(Target:=TargetFile.Path, _
ColumnCount:=18, _
MergeColStart:=15 _
) + TargetCount
End If
End If
Next TargetFile
End If

MsgBox TargetCount & " of " & TargetCountA & " files have been processed.", vbInformation, "Complete!"

End Sub

Sub CorrectColumnInCSV_File()

Dim TargetFile As Variant
Dim TargetName As String
Dim LoopStep As Long

TargetFile = Application.GetOpenFilename(FilterReturn(CSV), MultiSelect:=True)
If Not IsArray(TargetFile) Then
Exit Sub
End If

TargetCount = 0
TargetCountA = UBound(TargetFile) - LBound(TargetFile) + 1
For LoopStep = LBound(TargetFile) To UBound(TargetFile)
TargetName = Right(TargetFile(LoopStep), Len(TargetFile(LoopStep)) - InStrRev(TargetFile(LoopStep), "\"))
If ISFILEOPEN(TargetName) = False Then
TargetCount = CorrectAdditionalColumnInCSV(Target:=TargetFile(LoopStep), _
ColumnCount:=18, _
MergeColStart:=15 _
) + TargetCount
End If
Next LoopStep

MsgBox TargetCount & " of " & TargetCountA & " files have been processed.", vbInformation, "Complete!"

End Sub


Private Function CorrectAdditionalColumnInCSV( _
ByVal Target As Variant, _
ByVal ColumnCount As Long, _
ByVal MergeColStart As Long, _
Optional ByVal Delimiter As String = ",", _
Optional ByVal ReplaceDelimiter As String = " ") As Long

Dim TempFile As String
Dim TempName As String
Dim TempPath As String
Dim TempExt As String
Dim FileNum1 As Long
Dim FileNum2 As Long
Dim LineText As String
Dim LineOutput As String
Dim LineItems() As String
Dim NDX As Long
Dim i As Long

TempPath = Left(Target, InStrRev(Target, "\"))
TempName = Right(Target, Len(Target) - Len(TempPath))
TempExt = Right(TempName, Len(TempName) - InStrRev(TempName, "."))
TempName = Left(TempName, Len(TempName) - Len(TempExt) - 1) & "(temp write)." & TempExt
TempFile = TempPath & TempName

FileNum1 = FreeFile()
Open Target For Input Access Read As #FileNum1
FileNum2 = FreeFile()
Open TempFile For Output Access Write As #FileNum2

Do While Not EOF(FileNum1)

LineText = ""
LineOutput = ""

Line Input #FileNum1, LineText
If Right(LineText, 1) = Delimiter Then LineText = Left(LineText, Len(LineText) - 1)

LineItems = Split(LineText, Delimiter)

If UBound(LineItems) - LBound(LineItems) + 1 > ColumnCount Then
LineOutput = WorksheetFunction.Substitute(LineText, Delimiter, ReplaceDelimiter, MergeColStart)
If InStr(1, LineOutput, Chr(34), vbTextCompare) > 0 Then
LineOutput = Replace(LineOutput, Chr(34), vbNullString)
End If
Else
LineOutput = LineText
End If

Print #FileNum2, LineOutput

Loop

CorrectAdditionalColumnInCSV = 1

ExitWithoutError:
Close #FileNum1
Close #FileNum2

If Dir(TempFile, vbNormal) <> "" Then
If Dir(Target, vbNormal) <> "" Then FileCopy TempFile, Target
Kill TempFile
End If

Exit Function

ExitWithError:
CorrectAdditionalColumnInCSV = 0
Resume ExitWithoutError

End Function


Function ISFILEOPEN(FileName As String) As Boolean
Dim iFilenum As Long
Dim iErr As Long
On Error Resume Next
iFilenum = FreeFile()
Open FileName For Input Lock Read As #iFilenum
Close iFilenum
iErr = Err
Select Case iErr
Case 0: ISFILEOPEN = False
Case 70: ISFILEOPEN = True
Case Else: Error iErr
End Select
On Error GoTo 0
End Function


Private Function FilterReturn(ByVal Value As FilterType) As String
Select Case Value
Case FilterType.CSV: FilterReturn = FilterCSV
Case FilterType.XL: FilterReturn = FilterXL
End Select
End Function


Edit: This should run fairly fast on a large set of files, since we're basically editing everything in memory as a text stream. I can run a test batch of 100 files (copies of your sample file with row 1 removed) and it runs almost instantaneously. You shouldn't have any problems running this on all of your files, assuming the structure is similar (as previously described).

SamT
01-12-2015, 04:16 PM
@ All
Timeout.

Zack, the OP has another issue at Need to standardize 100s of excel files that have columns in the wrong order (http://www.vbaexpress.com/forum/showthread.php?51513-Need-to-standardize-100s-of-excel-files-that-have-columns-in-the-wrong-order)

He has strongly hinted that these are the same files

I would think I need to fix the comma issue and get the data in the correct columns before I run code for this part. I have PM'ed him about the question

In that thread, the issues are


The column headers are not named the same in different files.
The columns are not in the same order in different files.
There are extra columns of other data which is not needed in the final product - therefore we do not want to copy them to the final file product.
Extra comma issue


In the Need To Standardize thread, there are only 14 significant columns and the Logon ID column is not significant. at least not in the Results example he uploaded

IMO, the OP has not yet provided enough clear information for VBAX to continue. I will work with Business1K and will add a thread Icon when all necessary information is available.

Thank you for your patience.
SamT

Zack Barresse
01-12-2015, 04:19 PM
Ok. I was under the impression these were in fact two distinct different problems for different file sets. The above code should fix the comma issue (worked for me in about a hundred copied test files). I'll look at the other thread. I would think (obviously some assumption going on here) some VBA could be written to combine and re-order data into a single identified structure regardless of this specific threads discussion. Clarification would be nice though. :)

SamT
01-12-2015, 04:32 PM
Business, click my name above to PM me.

SamT
01-12-2015, 11:10 PM
This thread is open.
Out of 450 similar files, this tread only applies to two of them
The other thread applies to all 450.

business1000
01-13-2015, 09:32 AM
Zack, your latest post with the updated complete code set works great.
I ran the macro against 100 files - each having 50K rows and it plowed thru them all in a total of only 2 minutes.
Great work - thanks again.

business1000
01-13-2015, 09:34 AM
Zack, your latest post with the updated complete code set works great.
I ran the macro against 100 files - each having 50K rows and it plowed thru them all in a total of only 2 minutes.
Great work - thanks again.