PDA

View Full Version : Reading .txt Files



Wishart
08-27-2008, 01:39 AM
Hi all,

I've been tasked with populating an Excel sheet with information drawn from around 50 different .txt files. Each file has to be read individually and searched through to count the occurences of a specific word. Two columns have to be made, one holding a hyperlink to the file, and the other holding the occurence count for the file in the cell adjacent. Having not used VB or VBA for several years, I'm having difficulty solving this problem. Would anyone be able to point me in the right direction? Any help here would be greatly appreciated.

N.B I'm using Microsoft Office 2003, if that helps at all

Dr.K
08-27-2008, 09:06 AM
When it comes to reading/writing text files you have two choices: use the built in VBA READ and WRITE methods, or use the Windows File System Object (FSO).

I ALWAYS use FSO, becuase its faster and more full featured.

Here is the tuturial I learned from (its poorly organized, but its got some good info):
http://www.techbookreport.com/tutorials/fso1.html

Also, MSDN has full documentation for the methods:
http://msdn.microsoft.com/en-us/library/6tkce7xa(VS.85).aspx


Here is a function that you can build your code around. I just tried it out, and its surprisingly fast.

Private Function CountSTRinTXT(TextFileFullName As String, StringToCount As String) As Long

Dim lngPos As Long
Dim strTemp As String
Dim objFS As Object 'Object for the FileSystem Object
Dim objTS As Object 'Object for a Text Stream

'Check function inputs, exit and return 0 if missing
CountSTRinTXT = 0
If Dir(TextFileFullName) = Empty Then Exit Function
If StringToCount = Empty Then Exit Function

'set up FileSystem objects
Set objFS = CreateObject("Scripting.FileSystemObject")
Set objTS = objFS.OpenTextFile(TextFileFullName, 1)

'Loop through text file line by line
Do
strTemp = objTS.ReadLine
lngPos = 1

'loop through line with InStr to find all occurences
Do

lngPos = InStr(lngPos, strTemp, StringToCount, vbTextCompare)

If lngPos > 0 Then
'increment function variable
CountSTRinTXT = CountSTRinTXT + 1
'increase lngPos to find next occurance
lngPos = lngPos + Len(StringToCount)
End If

Loop Until lngPos = 0

Loop While objTS.AtEndOfStream = False

'close out objects
objTS.Close
Set objTS = Nothing
Set objFS = Nothing


End Function

mdmackillop
08-27-2008, 10:08 AM
Without looping is a little quicker (but not appreciably so), and needs Option Compare Text to handle Case

strTemp = objTS.Readall
t = Replace(strTemp, StringToCount, "")
CountSTRinTXT = (Len(strTemp) - Len(t)) / Len(StringToCount)

Wishart
08-28-2008, 02:57 AM
Many thanks to the both of you, you've just made my life so much easier!

Edit: The links are also incredibly useful, so thanks for providing those :)

Dr.K
08-28-2008, 07:11 AM
Glad you liked it! Please mark the thread as "Solved".


mdmackillop-

I avoided using ReadAll, becuase I don't know how big his files are. I know that VBA is pretty good about handling enormous strings, but for a general purpose function it felt safer to loop through using ReadLine instead.

However, eliminating the second, inner loop by subbing out the target text and comparing lengths is pure genius! Thanks for the trick.



EDIT: I don't understand what this means, please explain.

and needs Option Compare Text to handle Case

RonMcK
08-28-2008, 07:38 AM
Dr.K:

Looking at Excel's Help entry for Option Compare shows the following:
The Option Compare statement specifies the string comparison method (Binary, Text) for a module. If a module doesn't include an Option Compare statement, the default text comparison method is Binary.

Option Compare Binary results in string comparisons based on a sort order derived from the internal binary representations of the characters. In Microsoft Windows, sort order is determined by the code page. A typical binary sort order is shown in the following example:
A < B < E < Z < a < b < e < z < ? < ? < ? < ? < ? < ?

Option Compare Text results in string comparisons based on a case-insensitive text sort order determined by your system's locale. When the same characters are sorted using Option Compare Text, the following text sort order is produced:
(A=a) < ( ?=?) < (B=b) < (E=e) < (?=?) < (Z=z) < (?=?)
HTH,

Dr.K
08-28-2008, 07:44 AM
Right, right, I know that it is something you can set at the module-level, its just that I never do becuase I always expliclitly declare it for each use.

I was just confused by what he meant by "case".

mdmackillop
08-28-2008, 09:00 AM
Glad you liked it! Please mark the thread as "Solved".


mdmackillop-

I avoided using ReadAll, becuase I don't know how big his files are. I know that VBA is pretty good about handling enormous strings, but for a general purpose function it felt safer to loop through using ReadLine instead.

However, eliminating the second, inner loop by subbing out the target text and comparing lengths is pure genius! Thanks for the trick.



EDIT: I don't understand what this means, please explain.

I've tested on a 400 page document (downloaded from Gutenberg), it takes a couple of seconds to load into the variable, but the Replace function is "immediate"

Replace will differentiate between "Test" and "test". Option Compare Text placed after Option Explicit gets around this problem.