PDA

View Full Version : Combine all text files from multiple folders into one worksheet



YasserKhalil
10-09-2016, 10:18 PM
Hello everyone
I am seeking for a way to combine all text files in a folder to one worksheet
The folder has multiple sub folders with some other text files , and some of the sub folders has text files too

I need to combine all these text files to one worksheet
Thanks advanced for help

mana
10-10-2016, 03:39 AM
Option Explicit

Dim n As Long

Sub test()

With Application.FileDialog(msoFileDialogFolderPicker)
If Not .Show Then Exit Sub
MergeFile .SelectedItems(1)
End With

End Sub


Sub MergeFile(pFld As String)
Dim fso As Object, cFld As Object, f As Object
Dim ts As String, v

Set fso = CreateObject("Scripting.FileSystemObject")

For Each f In fso.GetFolder(pFld).Files
If fso.GetExtensionName(f.Name) = "csv" Then
With fso.GetFile(pFld & "\" & f.Name).OpenAsTextStream
ts = .ReadAll
.Close
End With
v = Split(ts, vbLf)
Range("a1").Offset(n).Resize(UBound(v)) = Application.Transpose(v)
n = n + UBound(v)
End If
Next

For Each cFld In fso.GetFolder(pFld).SubFolders
MergeFile cFld.Path
Next

End Sub

YasserKhalil
10-10-2016, 04:32 AM
That's wonderful Mr. Mana.
You are great and awesome
Thank you very much for sharing this great code ..

Just face a problem with a text file with Arabic letters (Just one) that appears in results in weird characters
while other text files are working well although they have Arabic letters(That's weird)

snb
10-10-2016, 05:51 AM
Sub M_snb()
c00 = "G:\OF\"
sn = Filter(Split(CreateObject("wscript.shell").exec("cmd /c dir """ & c00 & "*.csv"" /b/s").stdout.readall, vbCrLf), ".")

With CreateObject("scripting.filesystemobject")
For Each it In sn
c01 = c01 & vbCrLf & .opentextfile(it).readall
Next
.createtextfile(c00 & "all.csv").write c01
End With

Workbooks.Open c00 & "all.csv"
End Sub

Paul_Hossler
10-10-2016, 06:48 AM
No need to list the csv files and then loop them one at a time



Sub M_phh()
p00 = "L:\Temporary Working Files\"
Call CreateObject("wscript.shell").Run(Environ("COMSPEC") & " /c copy """ & p00 & "*.csv""" & " """ & p00 & "all.csv""", 0, True)
Workbooks.Open p00 & "all.csv"
End Sub

YasserKhalil
10-10-2016, 10:23 AM
Thank you very much for great solutions provided
As for snb it is more flexible for me as I have multiple sub folders .. Yours is working well for all the text file in the parent folder only

Generally this is NOT the problem now ..
My problem is with the text file I attached earlier as it is imported in weird characters for Arabic letters. Any idea how to fix the problem?

snb
10-10-2016, 01:09 PM
As for snb it is more flexible for me as I have multiple sub folders .. Yours is working well for all the text file in the parent folder only

You are mistaken.

YasserKhalil
10-10-2016, 01:24 PM
But I tried both .. By first I got the text strings in all folders and subfolders .. while second got the text strings from just the parent file
I am talking about my test .. That's all

Leith Ross
10-10-2016, 03:18 PM
Hello Yasser,

Text files can have different formats. I know that sounds odd but it is true. Text files can be encoded as plan ASCII, Unicode, Unicode Big Endian, and UTF-8.

The zip file included with this post has a workbook and your sample text. There is a button to run the macro on the worksheet. When run, there will be 3 different outputs of the sample text file. The first will be the text decoded using the Byte Order Mark or BOM. The second and third outputs are ASCII and Unicode. You will see the output from the macro TextToWorksheet properly reads the file based on the BOM.

Remember to extract the files from the Zipped Folder before running the macro or the macro won't run.

Macro Code for Module Text_Files_To_Worksheet


' Written: October 05, 2016
' Author: Leith Ross
' Summary: Imports all text files from a selected folder and copies each into a cell.
' Each cell can hold 32 kb of text. The text must be free of non printable
' characters. The only exception is for line feed.
'
' Each cell is limited to 254 line feeds. Any text after the 254th line feed is lost.
' This liimit is built-in to Excel and cannot be changed.
'
' Text files exceeding 32 kb will be divided into additional cells.
'
' Windows recognizes 4 types of text file encoding: ASCII, Unicode,
' Unicode Big Endian, and UTF-8. The macro examines each file for
' a Byte Order Mark (BOM) to determine the encoding. If the encoding
' cannot be determined then ASCII will be used by default.




Sub TextFilesToWorksheet()


Dim BOM(3) As Byte
Dim Cell As Range
Dim File As Object
Dim Files As Object
Dim Folder As Variant
Dim k As Long
Dim n As Long
Dim NewText As String
Dim oStream As Object
Dim Text As Variant
Dim Wks As Worksheet
Dim x As Variant

Folder = ThisWorkbook.Path

Set Wks = ThisWorkbook.Worksheets("Sheet1")
Set Cell = Wks.Range("A2")

With CreateObject("Shell.Application")
Set Folder = .Namespace(Folder)
End With

Set Files = Folder.Items
Files.Filter 64, "*.txt"

For Each File In Files
Open File.Path For Binary Access Read As #1
Get #1, , BOM
Close #1

Text = ""

For n = 0 To 3
x = Hex(BOM(n))
If Len(x) = 1 Then x = "0" & x
Text = Text & x
Next n

Set oStream = CreateObject("ADODB.Stream")
With oStream
.Type = 2
.Charset = "ascii"
If Text Like "EFBBBF*" Then .Charset = "UTF-8" ' 239, 187, 191
If Text Like "FFFE*" And Text <> "FFFE0000" Then .Charset = "UnicodeFFFE"
If Text Like "FEFF*" Then .Charset = "Unicode"
.Open
.LoadFromFile File.Path
Text = .ReadText
While .State > 1: DoEvents: Wend
.Close
End With

n = Len(Text)

For k = 1 To n Step 32766
NewText = Mid(Text, k, 32766)
NewText = Application.Clean(NewText)
Cell.Value = NewText
Set Cell = Cell.Offset(1, 0)
Next k

x = n Mod 32766

If n > 32766 And x > 0 Then
NewText = Mid(Text, n - x + 1, x)
NewText = Application.Clean(NewText)
Cell.Value = NewText
Set Cell = Cell.Offset(1, 0)
End If

Set oStream = Nothing
Next File

End Sub



Macro Code for Module1


Sub Run()


Dim Data() As Byte
Dim File As Object
Dim FileName As String
Dim FilePath As Variant
Dim Folder As Object
Dim Item As Variant

' Read the text file using the Byte Order Mark (BOM) and output the results to A2.
TextFilesToWorksheet


' Open the text file normally.
FilePath = ThisWorkbook.Path
FileName = "Sample Text.txt"

With CreateObject("Shell.Application")
Set File = .Namespace(FilePath).ParseName(FileName)
End With

Open File.Path For Binary Access Read As #1
ReDim Data(LOF(1))
Get #1, , Data
Close #1

' Output the text file in ASCII.
Range("A3").Value = StrConv(Data, vbFromUnicode)

' Ouput the text in Unicode.
Range("A4").Value = StrConv(Data, vbUnicode)

End Sub

YasserKhalil
10-10-2016, 03:27 PM
Thank you very very much Mr. Leith Ross for this wonderful solution.
That's really awesome.
How can I apply this fantastic code to all the text files in multiple folders with sub folders and sub sub folders...?

Leith Ross
10-10-2016, 03:42 PM
Hello Yasser,

I will modify the code for you. Do you want to select the parent folder or hard code the folder path into the macro?

YasserKhalil
10-10-2016, 03:49 PM
Thanks a lot Mr. Leith
I would like to select the parent folder so as to be more flexible ..
Best Regards

dunkin
10-13-2016, 08:09 AM
Hi Mana how to combine 50 columns from one worksheet in to another worksheet total combine is 100 columns.

Leith Ross
10-15-2016, 11:27 AM
Hello Yasser,

I didn not forget about you and apologise for the long delay in answering. A friend of mine died from pancreatic cancer a few days ago.

Here is the revised code with comments. The attached workbook has the code added and a button to run the macro.



' Written: October 05, 2016
' Author: Leith Ross
' Summary: Imports all text files from a selected folder and copies each into a cell.
' Each cell can hold 32 kb of text. Text files exceeding 32 kb will have the
' text divided into additional cells.
'
' NOTES:
' Text files that are read by this macro and converted are rendered correctly.
' Because workhseet cells operate under the rules of the Excel application,
' the text in the cell may not appear the same as when the same text is ouput
' and opened opened with Notepad.
'
' Windows recognizes 4 types of text file encoding: ANSI (Windows-1252),
' Unicode, Unicode Big Endian, and UTF-8. The macro examines each file
' for a Byte Order Mark (BOM) to determine the encoding. If the encoding
' cannot be determined then ANSI(Windows-1252) will be used by default.
'
' Each cell is limited to 254 line feeds. Any text after the 254th line feed is lost.
' This liimit is built-in to Excel and cannot be changed.
'
' Carriage return characters, ASCII 13, are removed automatically when the text is copied
' into a cell on the worksheet. This behaviour is built-in and can not be changed.
'
' If the text contains a character code of zero then Excel considers this to be the string's
' terminating point and no other characters will be included in the cell.
'
' More text data can be included in a cell by removing all non-printable characters.
' However, this changes the original formatting and can make the text difficult to read.




Global oShell As Object


Sub TextFileToWorksheet(ByRef File As Object)


Dim BOM(3) As Byte
Dim Cell As Range
Dim Files As Object
Dim Folder As Object
Dim k As Long
Dim n As Long
Dim NewText As String
Dim oStream As Object
Dim Text As String
Dim Wks As Worksheet
Dim x As Variant


Set Wks = ThisWorkbook.ActiveSheet

Set Cell = Wks.Cells(Rows.Count, "A").End(xlUp)
If Cell.Row < 2 Then
Set Cell = Wks.Range("A2")
Else
Set Cell = Cell.Offset(1, 0)
End If

' The BOM will always be the first 4 characters of the text file.
Open File.Path For Binary Access Read As #1
Get #1, , BOM
Close #1

Text = ""

' Make all BOM hex values 2 digits.
For n = 0 To 3
x = Hex(BOM(n))
If Len(x) = 1 Then x = "0" & x
Text = Text & x
Next n

Set oStream = CreateObject("ADODB.Stream")
With oStream
.Type = 2
.Charset = "windows-1252"
If Text Like "EFBBBF*" Then .Charset = "UTF-8" ' Decimal 239, 187, 191
If Text Like "FFFE*" And Text <> "FFFE0000" Then .Charset = "UnicodeFFFE" ' Decimal 255, 254
If Text Like "FEFF*" Then .Charset = "Unicode" ' Decimal 254, 255
.Open
.LoadFromFile File.Path
n = .Size
Text = .ReadText
While .State > 1: DoEvents: Wend
.Close
End With

For k = 1 To n Step 32767
NewText = Mid(Text, k, 32767)
NewText = Application.Clean(NewText) ' Remove all non-printable characters.
Cell.Resize(1, 2).Value = Array(NewText, File.Path) ' File text in column "A" and file path in column "B".
Set Cell = Cell.Offset(1, 0)
Next k

x = n Mod 32767

If n > 32767 And x > 0 Then
NewText = Mid(Text, (n - x) + 1, x)
NewText = Application.Clean(NewText) ' Remove all non-printable characters.
Cell.Resize(1, 2).Value = Array(NewText, File.Path) ' File text in column "A" and file path in column "B".
Cell.Value = NewText
End If

Set oStream = Nothing

End Sub


Sub GetFolders(ByVal FolderPath As Variant, ByVal SubFolderLevel As Long)


Dim File As Object
Dim Files As Object
Dim Folder As Object
Dim Folders As Object
Dim SubFolder As Object
Dim SubFolders As Object

If oShell Is Nothing Then
Set oShell = CreateObject("Shell.Application")
End If

Set Folder = oShell.Namespace(FolderPath)

If Folder.Self.Type Like "*zipped*" Then
Exit Sub
End If

' Return a list of all text files in this folder.
Set Files = Folder.Items
Files.Filter 64, "*.txt"

' Copy each file's text to the worksheet.
For Each File In Files
Call TextFileToWorksheet(File)
Next File

' Return a list of all Subfolders in this folder.
Set SubFolders = Folder.Items
SubFolders.Filter 32, "*"

' Recursively search through this list of subfolders.
For Each SubFolder In SubFolders
If SubFolderLevel <> 0 Then
Call GetFolders(SubFolder.Path, SubFolderLevel - 1)
End If
Next SubFolder

End Sub


Sub Run()


Dim Cell As Range
Dim Data() As Byte
Dim File As Object
Dim FilePath As String
Dim FolderPath As Variant
Dim Folder As Object
Dim Item As Variant

' Clear from row 2 down to the last row with data in it.
With ThisWorkbook.ActiveSheet
Intersect(.UsedRange, .UsedRange.Offset(1, 0)).ClearContents
End With

' Select the parent folder and start the search.
With Application.FileDialog(msoFileDialogFolderPicker)
.Show
If .SelectedItems.Count = 0 Then Exit Sub
FolderPath = .SelectedItems(1)
Call GetFolders(FolderPath, -1)
End With

End Sub

YasserKhalil
10-15-2016, 11:49 AM
Mr. Leith Ross
Never mind about delay I am not in hurry and I am very sorry about the news of your friend

Thanks a lot for the solution .. As you remember the problem was with the Arabic letters and you have solved it in previous post
Now the problem appears again .. Would you fix it ?

Leith Ross
10-15-2016, 02:44 PM
Hello Yasser,

Sorry about that. Once you mentioned the problem, I remembered what I left out. All the changes were made to GetFolders. Here is the updated code and workbook.



' Written: October 05, 2016
' Author: Leith Ross
' Summary: Imports all text files from a selected folder and copies each into a cell.
' Each cell can hold 32 kb of text. Text files exceeding 32 kb will have the
' text divided into additional cells.
'
' NOTES:
' Text files that are read by this macro and converted are rendered correctly.
' Because workhseet cells operate under the rules of the Excel application,
' the text in the cell may not appear the same as when the same text is ouput
' and opened opened with Notepad.
'
' Windows recognizes 4 types of text file encoding: ANSI (Windows-1252),
' Unicode, Unicode Big Endian, and UTF-8. The macro examines each file
' for a Byte Order Mark (BOM) to determine the encoding. If the encoding
' cannot be determined then ANSI(Windows-1252) will be used by default.
'
' Each cell is limited to 254 line feeds. Any text after the 254th line feed is lost.
' This liimit is built-in to Excel and cannot be changed.
'
' Carriage return characters, ASCII 13, are removed automatically when the text is copied
' into a cell on the worksheet. This behaviour is built-in and can not be changed.
'
' If the text contains a character code of zero then Excel considers this to be the string's
' terminating point and no other characters will be included in the cell.
'
' More text data can be included in a cell by removing all non-printable characters.
' However, this changes the original formatting and can make the text difficult to read.




Global oShell As Object


Sub TextFileToWorksheet(ByRef File As Object)


Dim BOM(3) As Byte
Dim Cell As Range
Dim Files As Object
Dim Folder As Object
Dim k As Long
Dim n As Long
Dim NewText As String
Dim oStream As Object
Dim Text As String
Dim Wks As Worksheet
Dim x As Variant


Set Wks = ThisWorkbook.ActiveSheet

Set Cell = Wks.Cells(Rows.Count, "A").End(xlUp)
If Cell.Row < 2 Then
Set Cell = Wks.Range("A2")
Else
Set Cell = Cell.Offset(1, 0)
End If

' The BOM will always be the first 4 characters of the text file.
Open File.Path For Binary Access Read As #1
Get #1, , BOM
Close #1

Text = ""

' Make all BOM hex values 2 digits.
For n = 0 To 3
x = Hex(BOM(n))
If Len(x) = 1 Then x = "0" & x
Text = Text & x
Next n

Set oStream = CreateObject("ADODB.Stream")
With oStream
.Type = 2
.Charset = "windows-1252"
If Text Like "EFBBBF*" Then .Charset = "UTF-8" ' Decimal 239, 187, 191
If Text Like "FFFE*" And Text <> "FFFE0000" Then .Charset = "UnicodeFFFE" ' Decimal 255, 254
If Text Like "FEFF*" Then .Charset = "Unicode" ' Decimal 254, 255
.Open
.LoadFromFile File.Path
n = .Size
Text = .ReadText
While .State > 1: DoEvents: Wend
.Close
End With

For k = 1 To n Step 32767
NewText = Mid(Text, k, 32767)
NewText = Application.Clean(NewText) ' Remove all non-printable characters.
Cell.Resize(1, 2).Value = Array(NewText, File.Path) ' File text in column "A" and file path in column "B".
Set Cell = Cell.Offset(1, 0)
Next k

x = n Mod 32767

If n > 32767 And x > 0 Then
NewText = Mid(Text, (n - x) + 1, x)
NewText = Application.Clean(NewText) ' Remove all non-printable characters.
Cell.Resize(1, 2).Value = Array(NewText, File.Path) ' File text in column "A" and file path in column "B".
Cell.Value = NewText
End If

Set oStream = Nothing

End Sub


Sub GetFolders(ByVal FolderPath As Variant, ByVal SubFolderLevel As Long)


Dim File As Object
Dim FileName As String
Dim Folder As Object
Dim SubFolder As Object
Dim SubFolders As Object

If oShell Is Nothing Then
Set oShell = CreateObject("Shell.Application")
End If

Set Folder = oShell.Namespace(FolderPath)

If Folder.Self.Type Like "*zipped*" Then
Exit Sub
End If

' Return a list of all text files in this folder.
FileName = Dir(FolderPath & "\*.txt")

Do While FileName <> ""
Set File = oShell.Namespace(Folder).ParseName(FileName)
Call TextFileToWorksheet(File)
FileName = Dir()
Loop

' Return a list of all Subfolders in this folder.
Set SubFolders = Folder.Items
SubFolders.Filter 32, "*"

' Recursively search through this list of subfolders.
For Each SubFolder In SubFolders
If SubFolderLevel <> 0 Then
Call GetFolders(SubFolder.Path, SubFolderLevel - 1)
End If
Next SubFolder

End Sub


Sub Run()


Dim Cell As Range
Dim Data() As Byte
Dim File As Object
Dim FilePath As String
Dim FolderPath As Variant
Dim Folder As Object
Dim Item As Variant

' Clear from row 2 down to the last row with data in it.
With ThisWorkbook.ActiveSheet
Intersect(.UsedRange, .UsedRange.Offset(1, 0)).ClearContents
End With

' Select the parent folder and start the search.
With Application.FileDialog(msoFileDialogFolderPicker)
.Show
If .SelectedItems.Count = 0 Then Exit Sub
FolderPath = .SelectedItems(1)
Call GetFolders(FolderPath, -1)
End With

End Sub

YasserKhalil
10-15-2016, 09:47 PM
Thanks a lot Mr. Leith
I am so sorry for disturning you again but it is the same problem ...
17334

Leith Ross
10-15-2016, 11:27 PM
Hello Yasser,

I ran the macro on the original text file you posted and the output was Arabic, even written right to left in the cell.

Can you post a copy of this text file so I can compare the two?

YasserKhalil
10-16-2016, 05:47 AM
Thanks for following up the thread Mr. Leith
Here's the text

41465940 احمد محمود عزيز

YasserKhalil
10-16-2016, 01:20 PM
Mr. Leith
I like your code in Post #16 a lot
Just need to know how to keep the same format of the text file
I mean to keep the same lines of data .. If there is an empty line, the code keeps the empty line ..
If there is return carriage in the text file, it is kept in the output
Simply to keep the same output of the text file ...
Thanks a lot for great help

Kenneth Hobs
10-16-2016, 08:00 PM
Obviously, post #16 may suit your needs more since you apparently have different characters sets for different files.

What you said in post #6 was correct. Post #5 as you said worked just on the parent folder and post #4 worked with all subfolders. I do fear that with lots of strings to concatenate for many files with large data, there might be an issue.

I used an approach which is sort of a marriage between #4 and #5. It too may be a problem at some time since I concatenated the file1+file2+ sort of thing for the source copy.

For my 13 files in 4 subfolders, #4 and my code produced essentially the same result.
FWIW:

'http://ss64.com/nt/dir.html
Sub Main()
Dim dest As String, q As String, s As String
Dim a() As Variant, b(1 To 3) As String
Dim i As Integer

dest = Environ("temp") & "\AllFiles.csv"
q = """"
Shell "cmd /c del " & q & dest & q, vbHide
DoEvents

a() = aFFs("c:\myfiles\excel\csv\*.csv", "/A:-D")

For i = LBound(a) To UBound(a)
a(i) = q & a(i) & q
Next i


b(2) = Join(a, "+")


b(1) = "cmd /c copy"
b(3) = q & dest & q
s = Join(b, " ")
'debug.Print s
Shell s, vbHide
DoEvents
Workbooks.Open dest
End Sub




'Set extraSwitches, e.g. "/ad", to search folders only. /A:-D <Files only.
'MyDir should end in a "\" character unless searching by wildcards, e.g. "x:\test\t*
'Command line switches for the shell's Dir, http://ss64.com/nt/dir.html
Function aFFs(myDir As String, Optional extraSwitches = "", _
Optional tfSubFolders As Boolean = False) As Variant

Dim s As String, a() As String, v As Variant
Dim b() As Variant, i As Long

If tfSubFolders Then
s = CreateObject("Wscript.Shell").exec("cmd /c dir " & _
"""" & myDir & """" & " /b /s " & extraSwitches).stdout.readall
Else
s = CreateObject("Wscript.Shell").exec("cmd /c dir " & _
"""" & myDir & """" & " /b " & extraSwitches).stdout.readall
End If

a() = Split(s, vbCrLf)
If UBound(a) = -1 Then
MsgBox myDir & " not found.", vbCritical, "Macro Ending"
Exit Function
End If
ReDim Preserve a(0 To UBound(a) - 1) As String 'Trim trailing vblfcr

For i = 0 To UBound(a)
If Not tfSubFolders Then
s = Left$(myDir, InStrRev(myDir, "\"))
'add the folder name
a(i) = s & a(i)
End If
Next i
aFFs = sA1dtovA1d(a)
End Function


Function sA1dtovA1d(strArray() As String) As Variant
Dim varArray() As Variant, i As Long
ReDim varArray(LBound(strArray) To UBound(strArray))
For i = LBound(strArray) To UBound(strArray)
varArray(i) = CVar(strArray(i))
Next i
sA1dtovA1d = varArray()
End Function

YasserKhalil
10-16-2016, 11:37 PM
Thank you very much Mr. Kenneth for sharing the issue
I replaced csv with txt in these lines


dest = Environ("Temp") & "\AllFiles.txt"

a() = aFFs("C:\Users\Future\Desktop\Test Folder\*.txt", "/A:-D")

As the files I need to combine are text files ..I got an error at this line and got no results

Workbooks.Open dest

Kenneth Hobs
10-17-2016, 05:26 AM
That can be a timing issue due to dest being deleted, or not enough time for shell to complete, or the parent folder does not exist or there are no .txt files in that folder. You need to set the 3rd option of aFFs() to True to iterate the subfolders.

You can easily debug to see if a() was created properly using MsgBox() or Debug.Print with Join().

To solve the timing issue, the best method is to shell and wait.

Sub Main() Dim dest As String, q As String, s As String
Dim a() As Variant, b(1 To 3) As String
Dim i As Integer, source As String

'dest = Environ("temp") & "\AllFiles.csv"
dest = Environ("Temp") & "\AllFiles.txt"
source = CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\Test Folder\*.txt"

q = """"

'www.cpearson.com/Excel/ShellAndWait.aspx
ShellAndWait "cmd /c del " & q & dest & q, 1000, vbHide, AbandonWait

a() = aFFs(source, "/A:-D", True)
Debug.Print Join(a, vbLf)

For i = LBound(a) To UBound(a)
a(i) = q & a(i) & q
Next i


b(2) = Join(a, "+")


b(1) = "cmd /c copy"
b(3) = q & dest & q
s = Join(b, " ")
Debug.Print s

ShellAndWait s, 1000, vbHide, AbandonWait


Workbooks.Open dest
End Sub

YasserKhalil
10-17-2016, 06:20 AM
Thank you very much Mr. Kenneth for this great solution
It works in wonderful way except for one text file (Post #19) ... It may be the file itself has a problem
But as for all the other text files, the code is working great
Thanks a lot for great help

Leith Ross
10-17-2016, 10:55 AM
Hello Yasser,

Here is another possible solution you may find useful. This code will copy each text file on the active sheet. Once the active sheet is filled, a new sheet is added and the process continues. This will display each line of each file in it's own row. There are no blank rows inserted between the files. The attached file contains the code shown here.



' Written: October 17, 2016
' Author: Leith Ross
' Summary: Imports all text files from a selected folder and copies them
' to the worksheet row by row. Once the sheet is full, a new sheet
' is added and the text continues on the sheet.


Option Explicit


Global LineCnt As Long
Global oShell As Object
Global Wks As Worksheet
Global Const ANSI As String = "windows-1252"


Sub TextFileToWorksheet(ByRef File As Object)


Dim BOM(3) As Byte
Dim Cell As Range
Dim Folder As Object
Dim n As Long
Dim Text As String
Dim x As Variant


Set Cell = Wks.Cells(Rows.Count, "A").End(xlUp)
If Cell.Row < 2 Then
Set Cell = Wks.Range("A2")
Else
Set Cell = Cell.Offset(1, 0)
End If

' The BOM will always be the first 4 characters of the text file.
Open File.Path For Binary Access Read As #1
Get #1, , BOM
Close #1

' Make all BOM hex values 2 digits.
For n = 0 To 3
x = Hex(BOM(n))
If Len(x) = 1 Then x = "0" & x
Text = Text & x
Next n

With CreateObject("ADODB.Stream")
.Type = 2
.Charset = ANSI
If Text Like "EFBBBF*" Then .Charset = "UTF-8" ' Decimal 239, 187, 191
If Text Like "FFFE*" And Text <> "FFFE0000" Then .Charset = "UnicodeFFFE" ' Decimal 255, 254
If Text Like "FEFF*" Then .Charset = "Unicode" ' Decimal 254, 255

.Open
.LoadFromFile File.Path

Do While Not .EOS
DoEvents
LineCnt = LineCnt + 1
Text = .ReadText(-2)
If Left(Text, 1) = "=" Then Text = "'" & Text
Cell.Value = Text
Set Cell = Cell.Offset(1, 0)
If LineCnt = Wks.Rows.Count Then
' Add a new worksheet and continue.
Set Wks = ThisWorkbook.Worksheets.Add(After:=ThisWorkbook.Worksheets.Count)
LineCnt = 0
End If
Loop

.Close
End With

End Sub


Sub GetFolders(ByVal FolderPath As Variant, ByVal SubFolderLevel As Long)


Dim File As Object
Dim FileName As String
Dim Folder As Object
Dim SubFolder As Object
Dim SubFolders As Object

If oShell Is Nothing Then
Set oShell = CreateObject("Shell.Application")
End If

Set Folder = oShell.Namespace(FolderPath)

If Folder.Self.Type Like "*zipped*" Then
Exit Sub
End If

' Return a list of all text files in this folder.
FileName = Dir(FolderPath & "\*.txt")

Do While FileName <> ""
Set File = oShell.Namespace(Folder).ParseName(FileName)
Call TextFileToWorksheet(File)
FileName = Dir()
Loop

' Return a list of all Subfolders in this folder.
Set SubFolders = Folder.Items
SubFolders.Filter 32, "*"

' Recursively search through this list of subfolders.
For Each SubFolder In SubFolders
If SubFolderLevel <> 0 Then
Call GetFolders(SubFolder.Path, SubFolderLevel - 1)
End If
Next SubFolder

End Sub


Sub Clear()


Set Wks = ThisWorkbook.ActiveSheet

' Clear from row 2 down to the last row with data in it.
With Wks
Intersect(.UsedRange, .UsedRange.Offset(1, 0)).ClearContents
End With


End Sub


Sub Run()


Dim FolderPath As Variant

Call Clear

' Select the parent folder and start the search.
With Application.FileDialog(msoFileDialogFolderPicker)
.Show
If .SelectedItems.Count = 0 Then Exit Sub
FolderPath = .SelectedItems(1)
Call GetFolders(FolderPath, -1)
End With

End Sub

YasserKhalil
10-17-2016, 06:36 PM
Thank you very much Mr. Leith
Now I have a problem with the language Arabic with all my text files. The output for Arabic letters appear in weird characters
As for Mr. Kenneth's solution just one text file with the problem but other text files are ok
Best Regards

Leith Ross
10-17-2016, 08:50 PM
Hello Yasser,

I would like you to post 3 of the Arabic files for me, so I can test them on my machine. Since I can not replicate the problem you are experiencing, it will help identify the cause.

YasserKhalil
10-17-2016, 09:04 PM
Ok Mr. Leith
I am very sorry to disturb you again and again
The problem occurs specially with AA4 text file

Leith Ross
10-17-2016, 10:46 PM
Hello Yasser,

Thank you for taking the time to create such an excellent sample and posting it. It is a great help.

The problem is your files are missing their byte order marks (BOM). The AA4 file is encoded as UTF-8 but without the BOM my macro will default to ANSI. That is why you see the strange characters.

I have a couple of ideas on how to fix the problem and will test them.

YasserKhalil
10-18-2016, 06:27 AM
Thanks a lot for your interest Mr. Leith
Waiting for your fascinating ideas. I am sure they will be wonderful

Leith Ross
10-18-2016, 05:31 PM
Hello Yasser,

This took a little longer than I thought it would. When a text file contains a byte order mark or BOM it makes it easy to read because the file encoding is known. However, a BOM is not a requirement. When it is missing it is a non trivial matter to determine the encoding method of the bytes. Windows uses 4 types of text encoding ANSI (which is actually Window-1252 code page), UTF-8, UTF-16 Little Endian, and UTF-16 Big Endian. The last one is usually encountered when importing a text file from a non Windows operating system.

The easiest approach is to test for valid UTF-8 encoding when the BOM is missing. If this test fails then the default encoding of ANSI is used. This not a perfect soltion but a good one. I ran the macro on my system. It read over 300 text files of different encodings and produced 115000+ lines of text with no errors. I am confident in it's accuracy.

The attached workbook contains all the latest updates.


New Module - Check_File_Encoding


' Written: October 18, 2016
' Author: Leith Ross
' Summary: Checks if a text file is encoded as UTF-8 when the file is missing a BOM.


Function IsUTF8(ByVal FilePath As String) As Boolean


Dim Data() As Byte
Dim n As Long


Open FilePath For Binary Access Read As #1
ReDim Data(LOF(1))
Get #1, , Data
Close #1

For n = 0 To UBound(Data)
If Data(n) > 127 Then
Select Case Data(n)
Case 194 To 223
If Data(n + 1) < 128 Then
Exit Function
Else
IsUTF8 = True
End If
Case 224 To 239
If Data(n + 1) < 128 Or Data(n + 2) < 128 Then
Exit Function
Else
n = n + 1
IsUTF8 = True
End If
Case 240 To 244
If Data(n = 1) < 128 Or Data(n + 2) < 128 Or Data(n + 3) < 128 Then
Exit Function
Else
n = n + 2
IsUTF8 = True
End If
End Select
End If
Next n

End Function



Module - Text_Files_To_Worksheet




' Written: October 17, 2016
' Updated: October 18, 2016
' Author: Leith Ross
' Summary: Imports all text files from a selected folder and copies them
' to the worksheet row by row. Once the sheet is full, a new sheet
' is added and the text continues on the sheet.


Option Explicit


Global LineCnt As Long
Global oShell As Object
Global Wks As Worksheet
Global Const ANSI As String = "windows-1252"


Sub TextFileToWorksheet(ByRef File As Object)


Dim BOM(3) As Byte
Dim Cell As Range
Dim Encoding As String
Dim Folder As Object
Dim Line As Variant
Dim Lines As Variant
Dim n As Long
Dim Text As String
Dim x As Variant


Set Cell = Wks.Cells(Rows.Count, "A").End(xlUp)
If Cell.Row < 2 Then
Set Cell = Wks.Range("A2")
Else
Set Cell = Cell.Offset(1, 0)
End If

Encoding = "No BOM"

' The BOM will always be the first 4 characters of the text file.
Open File.Path For Binary Access Read As #1
Get #1, , BOM
Close #1

' Make all BOM hex values 2 digits.
For n = 0 To 3
x = Hex(BOM(n))
If Len(x) = 1 Then x = "0" & x
Text = Text & x
Next n

With CreateObject("ADODB.Stream")
.Type = 2

If Text Like "EFBBBF*" Then Encoding = "UTF-8" ' Decimal 239, 187, 191
If Text Like "FFFE*" And Text <> "FFFE0000" Then Encoding = "UnicodeFFFE" ' Decimal 255, 254
If Text Like "FEFF*" Then Encoding = "Unicode" ' Decimal 254, 255

If Encoding <> "No BOM" Then
.Charset = Encoding
Else
If IsUTF8(File.Path) Then .Charset = "UTF-8" Else .Charset = ANSI
End If

.Open
.LoadFromFile File.Path
Text = .ReadText
.Close


Lines = Split(Text, vbCrLf)

For Each Line In Lines
DoEvents
LineCnt = LineCnt + 1

Cell.Value = Line
'Cell.Offset(0, 1) = File.Path

If LineCnt = Wks.Rows.Count Then
' Add a new worksheet and continue.
Set Wks = ThisWorkbook.Worksheets.Add(After:=ThisWorkbook.Worksheets.Count)
LineCnt = 0
End If
Set Cell = Cell.Offset(1, 0)
Next Line
End With

End Sub


Sub GetFolders(ByVal FolderPath As Variant, ByVal SubFolderLevel As Long)


Dim File As Object
Dim FileName As String
Dim Folder As Object
Dim SubFolder As Object
Dim SubFolders As Object

If oShell Is Nothing Then
Set oShell = CreateObject("Shell.Application")
End If

Set Folder = oShell.Namespace(FolderPath)

If Folder.Self.Type Like "*zipped*" Then
Exit Sub
End If

On Error Resume Next
' Get the first text file in the folder.
FileName = Dir(FolderPath & "\*.txt")
If Err <> 0 Then Exit Sub
On Error GoTo 0

' Return a list of all text files in this folder.
Do While FileName <> ""
Set File = oShell.Namespace(Folder).ParseName(FileName)
Call TextFileToWorksheet(File)
FileName = Dir()
Loop

' Return a list of all Subfolders in this folder.
Set SubFolders = Folder.Items
SubFolders.Filter 32, "*"

' Recursively search through this list of subfolders.
For Each SubFolder In SubFolders
If SubFolderLevel <> 0 Then
Call GetFolders(SubFolder.Path, SubFolderLevel - 1)
End If
Next SubFolder

End Sub


Sub Clear()


Set Wks = ThisWorkbook.ActiveSheet

' Clear from row 2 down to the last row with data in it.
With Wks
Intersect(.UsedRange, .UsedRange.Offset(1, 0)).ClearContents
End With


End Sub


Sub Run()


Dim FolderPath As Variant

Call Clear

' Select the parent folder and start the search.
With Application.FileDialog(msoFileDialogFolderPicker)
.Show
If .SelectedItems.Count = 0 Then Exit Sub
FolderPath = .SelectedItems(1)
Call GetFolders(FolderPath, -1)
End With

End Sub

YasserKhalil
10-19-2016, 05:48 AM
Thank you very very much Mr. Leith ..
It seems that it is complicated .. Now the file that has the problem has been solved in perfect way but the Arabic letters in the other files have the problem ...
This is an image (The green is OK and no problem while the Red is a problem now)
17371

Leith Ross
10-19-2016, 07:57 AM
Hello Yasser,

The results you posted are correct. If you examine the files in Notepad, you will see the same results. The files in the red boxes were saved using ANSI encoding. Arabic characters must be saved as UTF-8.

YasserKhalil
10-19-2016, 02:04 PM
Sorry for disturbing you Mr. Leith
I have attached one of the files (shown in Arabic ..)

I have a lot of text files .. the process of converting them from ANSI to UTF-8 will be impossible mission?
Can your both solutions merged together so as to get the perfect version..?
You have posted before a code that was successful for all text files except one .. and the last code was successful with just this one !!
Hope it is easy and available.. If not, never mind, I can depend on the first successful code and repair the wrong output for others


17375

Leith Ross
10-19-2016, 05:11 PM
Hello Yasser,


Once a file has been saved in ANSI, you can not save the file again in UTF-8 to convert it. The reason is ANSI does support Arabic characters.

The source of these files is a mystery to me. Are you importing these files from a network or did you create them on your machine? Text files can be encoded differently depending on the software or hardware used when the file was saved.

If the files are really text files and not Excel workbooks saved as text files, which is the wrong way to create a text file, you need to be mindful of the application's text encoding choices when saving the file.


ANSI is a vague term used to describe text encoding. It is really nothing more than the default code page used for your locale. A code page is an extended ASCII encoding. It makes use of the characters from 128 to 255 to

represent local characters and symbols when possible. A code page on your computer may allow you to display Arabic and that is why the files appears correct for you. However, my default code page is different from yours and does not

support Arabic. Encoding the file using UTF-8 (Unicode Transformation Format 8 byte) ensures the file can be read by both you and I the same way. This encoding must be done when the file is first saved.

An easy way to check your text file encoding is to use Notepad. Open the text file and then choose Save As. When the dialog appears, the file's encoding will be displayed at the bottom right side in a drop down selection box.


All that being said, I am not sure how I can help you sort out the file encoding problem using automation. Kenneth's code seems to meet your needs better than mine.

YasserKhalil
10-19-2016, 05:27 PM
Thank you very much for this plenty explanation Mr. Leith Ross
In fact I am very satisfied of the solutions presented .. They are all great and very useful..
As for the files I have created it through notepad .. normal routine not imported from other systems
Generally I think it is very great to have 95 % of the solution. I am very grateful to your efforts
And thanks a lot for Mr. Kenneth Hobson for his great solution too
Best and kind regards

Leith Ross
10-20-2016, 07:52 PM
Hello Yasser,

I figured out what the problems were. First, the single UTF-8 file was corrupted. After saving it again as UTF-8, the macro now recognizes it. The ANSI code page is now Windows-1256, which supports both English and Arabic characters.

If the file is not UTF-8 then code page 1256 is used. All the files you posted are included along with the updated workbook. Extract the files to a new folder before opening the new workbook and running the macro.

YasserKhalil
10-20-2016, 09:59 PM
Thank you very very much Mr. Leith for this wonderful and awesome solution
Thanks a lot for your patience in solving that issue
Best and kind regards

Leith Ross
10-21-2016, 07:49 AM
Hello Yasser,

You're welcome and thank you for the experience. It has taught me much more about multi-language support for files.

Guma slàn dhut!
(Be well!)