PDA

View Full Version : Check first 10 characters of every file in list



theta
09-17-2012, 11:59 AM
Hi...I am trying to analyse a list of files. The files are listed from A1:A100 in a range called 'rngFiles', I need to open them as text stream and capture the first 10 chars of the file, and paste it into the adjacent cell in column B.

The only problem is that some of the files are large (100mb+) so I do not want the entire file to be opened and analysed - I just want the first 10 characters of the file if opened as text.

Any ideas how I can achieve this?

NOTE : I will be comparing this to a predefine list to find inconsistencies in files with a given extension.

GarysStudent
09-17-2012, 02:17 PM
Are you familiar with FileSystemObject (FSO)?

theta
09-17-2012, 02:48 PM
Just found this....should help

http://www.mahipalreddy.com/vb/vbarticle1.htm

Any ideas on best approach?

Kenneth Hobs
09-17-2012, 04:24 PM
Give an example of the values. What types of files are they?

theta
09-18-2012, 01:55 AM
Example - if you open a ZIP file using a hex editor or notepad, the first two characters will be "PK" as the file is a .ZIP

I would like to scroll through a list catching the first 10 characters of every file (as they define the true type) then compare this to the files extension e.g.



FILE.................FIRST CHARS....EXTENSION....ACTUAL
=====................===========....=========....======
C:\temp\hello.doc....PK.............DOC..........ZIP...

snb
09-18-2012, 03:48 AM
Why wouldn't the extension suffice to check what file you are dealing with ?

It would be nice if you elaborated somewhat more on the purpose of this exercise, which problem has to be resolved, which results obtained and what you plan to do with the results.
There are probably many more approaches possible if you describe your problem in more detail.

theta
09-18-2012, 03:53 AM
Several files have been intentionally renamed to hide their true type. Network policy goes through and deletes certain files (.mp3, .mp4, .zip), but this is based on the extension of the file.

I would like to examine a list of files A1:A100 rngFiles to check the true file type and compare this to the extension assigned.

Should show 'hidden' zip files and media files.

snb
09-18-2012, 04:11 AM
You could try the property 'Type'.

c112 = CreateObject("scripting.filesystemobject").getfile("E:\OF\voorbeeld.xls").Type

or



c312 = CreateObject("shell.application").namespace("E:\OF\").Items.Item("voorbeeld.xls").Type

p45cal
09-18-2012, 04:17 AM
try this (full path including drive and filename should be in column A) which puts the 1st 10 characters in column B and the extension in column C next to the filenames:Sub blah()
For Each cll In Range("A1:A100").Cells
Filename = cll.Value
FileNo = FreeFile
Open Filename For Input Access Read As #FileNo Len = 10
Flen = FileLen(Filename)
If Flen > 0 Then
myVar = Input(Application.Min(Flen - 1, 10), #FileNo)
Else
myVar = "Zero length File"
End If
With cll.Offset(, 1)
.NumberFormat = "@"
.Value = myVar
End With
cll.Offset(, 2).Value = Mid(Filename, InStrRev(Filename, ".") + 1)
Close #FileNo
Next cll
End Sub

Aflatoon
09-18-2012, 04:21 AM
Perhaps:
Function ReadFileTenChars(sFileName) As String
Dim FileNum As Integer
Dim sFile As String * 10

FileNum = FreeFile()
'Open Text File For Input
Open sFileName For Input Access Read Shared As #FileNum Len = 10
Input #FileNum, sFile
Close #FileNum
ReadFileTenChars = sFile
End Function

theta
09-18-2012, 04:33 AM
Open Filename For Input Access Read As #FileNo Len = 10
Flen = FileLen(Filename)
If Flen > 0 Then
myVar = Input(Application.Min(Flen - 1, 10), #FileNo)
Else
myVar = "Zero length File"
End If


Wow thanks, will try that now. Could you please annotate this code section as it is the only part I am unfamiliar with.

You are opening a file and specifying it has a length of 10 (before accessing it)? I was not aware this method was available! You are then Checking the file contains at least 10 chars - or is reported as a zero length file?

p45cal
09-18-2012, 04:49 AM
Could you please annotate this code section as it is the only part I am unfamiliar with.Open Filename For Input Access Read As #FileNo Len = 10'Len=10 defines the size of the buffer used.
Flen = FileLen(Filename)'Flen will contain the length of the file, just in case it's shorter than 10 characters, because the Input statement balks if it tries to get data from beyond the end of the file
If Flen > 0 Then' if the file has more than zero length then..
myVar = Input(Application.Min(Flen - 1, 10), #FileNo) 'get the first ten characters (or the as many characters as there are in the file, whichever is the smaller) and put it into a variable called myVar
Else
myVar = "Zero length File"' signal to the user that the file is zero length rather than just leave the cell blank.
End If
You are opening a file and specifying it has a length of 10 (before accessing it)? I was not aware this method was available! No I'm not.
You are then Checking the file contains at least 10 chars - or is reported as a zero length file?No, I'm just fetching up to 10 characters from the file, and stating that there are none if it's a zero length file.

snb
09-18-2012, 05:24 AM
You can check the length of a file before opening it:


if filelen(filename)>0 then

p45cal
09-18-2012, 05:41 AM
You can check the length of a file before opening itYes, that would save unnecessary file opening and speed it up a tiny bit (more if many of the files are empty files). So:Sub blah2()
For Each cll In Range("A1:A100").Cells
Filename = cll.Value
Flen = FileLen(Filename)
If Flen > 0 Then
FileNo = FreeFile
Open Filename For Input Access Read As #FileNo Len = 10
myVar = Input(Application.Min(Flen - 1, 10), #FileNo)
Close #FileNo
Else
myVar = "Zero length File"
End If
With cll.Offset(, 1)
.NumberFormat = "@"
.Value = myVar
End With
cll.Offset(, 2).Value = Mid(Filename, InStrRev(Filename, ".") + 1)
Next cll
End Sub
Another way to speed it up is to reduce the number of times data is written to the sheet by putting all the results in an array and writing the whole array to the sheet at the end.

snb
09-18-2012, 05:55 AM
or

Sub snb()
sn=range("A1:A100")
for j=1 to ubound(sn)
if dir(sn(j,1))<>"" then
If FileLen(sn(j,1)) > 9 Then
Open sn(j,1) For Binary As #1
sn(j,1)=Input(10,1)
Close
End If
end if
next
range("B1:B100")=sn
End Sub

p45cal
09-18-2012, 06:16 AM
or

Sub snb()
sn=range("A1:A100")
for j=1 to ubound(sn)
if dir(sn(j,1))<>"" then
If FileLen(sn(j,1)) > 9 Then
Open sn(j,1) For Binary As #1
sn(j,1)=Input(10,1)
Close
End If
end if
next
range("B1:B100")=sn
End Sub
There may be one or two gotchas to look out for here; I was pulled up by one error where the first character in the file was an '=' (equals sign) and not followed by what Excel might recognise as a valid formula, and the code errored unless the cell was foramatted as Text first. (btw why don't you include any characters if there are less than 10?).
The other error I came across was using the filelength itself when trying to grab all the characters, but it complained that I was going beyond where it could go - I'm guessing that the EOF character is being counted as a character but it can't be Input? That's why I took 1 less than the file length.

snb
09-18-2012, 06:27 AM
The first 'error' is an Excel produced error.
Readily overcome using


sn(j,1)="'" & Input(10,1)


Your second one by using

sn(j,1)=Inputb(LOF(1),#1)

theta
09-20-2012, 03:13 AM
Thanks guys. I am putting these methods into a workbook now (with userform etc) to produce a tool.

Will upload when complete, and gladly take any comments.