PDA

View Full Version : Getting number of pages without opening document in word.



presence76
03-02-2006, 10:04 AM
I recently modified some code so that I manually open a document within VBA and then get the number of pages for that document and based upon that number, copy the file to an appropriate folder (onepagedocs) or (twopagedocs). The problem is that it costs way too much for resources, especially since the directory has around 3000 documents. Here is the code that does it:

Sub checkpages()

Dim fso As FileSystemObject
Dim fldr As Folder
Dim f As File
Dim myDoc As Document
Dim totpages As Integer
Dim onepagedir As String
Dim twopagedir As String

onepagedir = "P:\Clients\Vanguard\Finance\testing\onepagedocs\"
twopagedir = "P:\Clients\Vanguard\Finance\testing\twopagedocs\"

If Dir(onepagedir) <> "" Then
Kill (onepagedir)
End If

If Dir(twopagedir) <> "" Then
Kill (twopagedir)
End If

MkDir (onepagedir)
MkDir (twopagedir)


Const TARGET_FOLDER As String = "P:\Clients\Vanguard\Finance\testing\"

Set fso = New FileSystemObject
Set fldr = fso.GetFolder(TARGET_FOLDER)
For Each f In fldr.Files
If Left(f.Name, 1) = "~" Then
onepagedir = onepagedir
Else
If Right(f.Name, 4) = ".doc" Then
Set myDoc = Documents.Open(TARGET_FOLDER & f.Name)
totpages = Selection.Information(wdNumberOfPagesInDocument)
If totpages = 1 Then
myDoc.SaveAs FileName:=onepagedir & f.Name
myDoc.Close False
Else
If totpages = 2 Then
myDoc.SaveAs FileName:=twopagedir & f.Name
myDoc.Close False
End If
End If
End If
End If
Next f

End Sub

Is there a way I can get the number of pages without using Documents.open and using the wdNumberOfPagesInDocument???

When I run this, the documents open on the screen and it uses way too many resources, especially with the 3000 documents that are in this directory.

Thanks in advance.

mdmackillop
03-02-2006, 02:38 PM
Have a look at document properties. It may provide a way forward.

Sub Info()
i = 0
For Each b In ActiveDocument.BuiltInDocumentProperties
i = i + 1
On Error Resume Next
Debug.Print i & " - " & ActiveDocument.BuiltInDocumentProperties(i).Name _
& " - " & ActiveDocument.BuiltInDocumentProperties(i).Value
Next
End Sub


Try totpages = myDoc.BuiltInDocumentProperties(14)
Also, use Application.ScreenUpdating = False at the head of your code, resetting to True at the end

TonyJollans
03-03-2006, 04:36 AM
Sorry, but it can't be done.

The number of pages in a document is, effectively, a function of the printer. Only when a document is open is information about the printer available to enable the calculation. Word (unlike Publisher, say) does not store printer details with documents and, consequently, does not store page counts.

The document property of number of pages (which I think is always 1 on a closed document) is a strange anomaly.

presence76
03-03-2006, 05:26 AM
How does windows poplulate the pages column if you display a directory with the "Pages" column turned on???? It must get that from somewhere without opening each individual document.

Marcster
03-03-2006, 05:36 AM
You may be able to do what you want if you download Microsoft's Dsofile.
Dsofile is actually the file name (Dsofile.dll) for the Microsoft
Developer Support OLE File Property Reader 2.0 Sample.
You can use it to read file properties via script.
Read more:
http://www.microsoft.com/technet/community/columns/scripts/sg0305.mspx
HTH,
Marcster.

TonyJollans
03-03-2006, 06:54 AM
Windows uses whatever Word provides - and dsofile will also give that same information. It is, however, not accurate.

fumei
03-03-2006, 04:55 PM
Tony, in the object model of Word 2003 there is a Page object. This object does not exist in previous versions of Word. I do not have 2003, nor access to it.

Question: is this a now a real stored property? Can you, in fact, get a page count from Word 2003 documents? I know you can not get accurate ones from prior versions - as you say, "page" is a printer (driver) derivative.

However, as I believe Word is STILL printer (driver) derivative, I would imagine that IF a page object count is stored, it must be based on the existing printer driver. So would THAT mean that the current printer driver calculations/information/blah blah is stored in a Word 2003 document?

TonyJollans
03-04-2006, 02:04 AM
Hi Gerry,

The Pages Collection in 2003 belongs to a Pane, which I don't think is stored with the document but, even if it is, it is only in a way which can be retrieved by Word - it is not a Property available to Windows (or dsofile) - I think.

fumei
03-04-2006, 03:33 PM
Thanks. I was kind of expecting that answer.

presence76
03-06-2006, 05:59 AM
DSO seems to give the proper page counts to me. If there is an issue, we will discover it fairly quickly as this code will be sifting through about 3,000 documents a day. If it's wrong, we will know. I will continue testing it for a few days and let you know. Thanks for all the replies.

presence76
03-11-2006, 01:36 PM
Here is code that works for diretories with over 1500 files.



Sub checkpages()

Dim Fso As FileSystemObject
Dim fldr As Folder
Dim f As File
Dim myDoc As Document
Dim totpages As Integer
Dim onepagedir As String
Dim twopagedir As String
Dim counter As Integer

Dim tempname As FileSystemObject

Application.Visible = False
Application.ScreenUpdating = False
Application.DisplayAlerts = False


onepagedir = "P:\Clients\Vanguard\Finance\testing\onepagedocs\"
twopagedir = "P:\Clients\Vanguard\Finance\testing\twopagedocs\"

If Dir(onepagedir) <> "" Then
Kill "P:\Clients\Vanguard\Finance\testing\onepagedocs\"
End If

If Dir(twopagedir) <> "" Then
Kill (twopagedir)
End If

MkDir (onepagedir)
MkDir (twopagedir)


Const TARGET_FOLDER As String = "P:\Clients\Vanguard\Finance\testing\"


Set Fso = New FileSystemObject
Set fldr = Fso.GetFolder(TARGET_FOLDER)
Set objFile = CreateObject("DSOFile.OleDocumentProperties")
For Each f In fldr.Files
If Left(f.Name, 1) = "~" Then
GoTo end_program
Else
objFile.Open (f)
totpages = objFile.SummaryProperties.PageCount
If totpages = 1 Then
f.Copy (onepagedir)
Else
If totpages = 2 Then
f.Copy (twopagedir)
End If
End If
End If

objFile.Close
counter = counter + 1

Next f
end_program:
Application.Visible = True
Application.ScreenUpdating = True
Application.DisplayAlerts = True


End Sub




One problem. I cannot get the piece with checking for the existence of the directories to work right. When I execute the macro and then it finishes, I try to execute it again which means those direcories exist and it should delete the exisiting directories and then allocate new ones. When it gets to the kill line in the code

If Dir(onepagedir) <> "" Then
Kill "P:\Clients\Vanguard\Finance\testing\onepagedocs\"
End If

It gives me the error "file not found"

I have also tried to use RmDir(onepagedir) but that says "path/file access error"

Any ideas? Thanks in advance.

Yet

mdmackillop
03-11-2006, 03:34 PM
I don't believe that Dir on an empty folder will return a proper result, and it seems unneccessary to kill the folder anyway, only the files within it.
Your test for ~ will cause the sub to halt when a temporary file is found. Presumably you just want to ignore them.
Counter seems to serve no purpose.
I've added some code to advise the existence of documents > 2 pages.

Try the following (completely untested ) code

Regards
MD


Option Explicit
Sub checkpages()

Dim Fso As FileSystemObject
Dim ObjFile As Object
Dim fldr As Folder
Dim f As File
Dim totpages As Long
Dim onepagedir As String
Dim twopagedir As String
Dim Msg As String

Const TARGET_FOLDER As String = "P:\Clients\Vanguard\Finance\testing\"

Application.Visible = False
Application.ScreenUpdating = False
Application.DisplayAlerts = False

On Error GoTo end_program

onepagedir = TARGET_FOLDER & "onepagedocs\"
twopagedir = TARGET_FOLDER & "twopagedocs\"

KillDir onepagedir
KillDir twopagedir

Set Fso = New FileSystemObject
Set fldr = Fso.GetFolder(TARGET_FOLDER)
Set ObjFile = CreateObject("DSOFile.OleDocumentProperties")
For Each f In fldr.Files
If Left(f.Name, 1) = "~" Then
DoEvents 'Do nothing
Else
ObjFile.Open (f)
totpages = ObjFile.SummaryProperties.PageCount
If totpages = 1 Then
f.Copy (onepagedir)
Else
If totpages = 2 Then
f.Copy (twopagedir)
Else
Msg = Msg & f.Name & vbCr
End If
End If
End If
ObjFile.Close
Next f
end_program:
Application.Visible = True
Application.ScreenUpdating = True
Application.DisplayAlerts = True
'If files have more than two pages
If Msg <> "" Then MsgBox Msg & " not copied", vbExclamation, "Docs over 2 pages"


End Sub
Sub KillDir(PathName as String)
'Based on http://www.vbaexpress.com/kb/getarticle.php?kb_id=559
Dim iTemp As Long
On Error Resume Next
iTemp = GetAttr(PathName)
Select Case Err.Number

Case Is = 0
Kill PathName & "*.*"
Case Else
MkDir PathName
End Select
End Sub

presence76
03-12-2006, 01:42 PM
Actually, the counters are for a balancing process that occurs after this code, which parses the documents. Then they are printed, with the operator knowing they can print all the one pages and all the two pages as seperate print jobs. They can also balance the number of documents with the file that came in.

The check for ~ and subsequent end is necessary because what happens when I run this is that a temporary word doc is created for each word file I reference, meaning the entire directory doubles. Because this directory contains an average of 1,500 word documents every time we run, that is quite costly in terms of performance.

I will try this code and reply to this thread with results.

Thanks alot for the reply.

fumei
03-13-2006, 11:49 PM
Excuse me, but am I understanding this correctly?
orary word doc is created for each word file I referenceThe purpose of this is to get a property without opening the file. Correct? Apparently, DSO will do this. OK, so far so good.

But if the purpose is met - the property value is returned, without opening the file:

WHAT temporary Word doc for WHAT Word file you are referencing?? If I understand correctly, Word is NOT opening the file, therefore WHY would there be a temporary Word file, ~ file? I am not getting this. if this is accurate, then...uh, methinks it would be better to actually open and close the things.

If these are persistent, then...yeah...1500 temp files. Hmmmm, that sounds efficient.

presence76
03-14-2006, 05:47 AM
First off, I will mark this SOLVED as the subject of this thread has had a viable solution provided.

Yes, you understood that correctly. The ~ code was in there for the OLD way of getting the number of pages and being that I just don't trust the windows environment all that much, I left it in. You are correct that at this point, it is probably unnecessary.

Getting back to the removing directory issue, here is the code that I have that works, sort of.


On Error Resume Next

PathName = ("P:\Clients\Vanguard\Finance\testing\")

iTemp = GetAttr("P:\Clients\Vanguard\Finance\testing\")
Select Case Err.Number

Case Is = 0
Kill PathName & "*.*"
RmDir PathName
MkDir PathName
Case Else
MkDir PathName
End Select


This does delete the 1,500 files in the testing directory with the Kill PathName "*.*" line, but the Rmdir does not work. Strangely, it does not error out on the Rmdir or the Mkdir line after it. I don't get that at all. Thanks in advance for any replies.

TonyJollans
03-14-2006, 06:18 AM
You haveOn Error Resume Nextso you don't see any errors - that doesn't mean they aren't occuring.

presence76
03-14-2006, 06:50 AM
Ugh. Another case of tunnel vision - Thanks. I believe I have conqered this. Here is the code I am using now

On Error Resume Next
PathName = ("P:\Clients\Vanguard\Finance\testing\")
PathNamedir = ("P:\Clients\Vanguard\Finance\testing")

iTemp = GetAttr("P:\Clients\Vanguard\Finance\testing\")
Select Case Err.Number

Case Is = 0
Kill PathName & "*.*"
RmDir PathNamedir
MkDir PathNamedir
Case Else
MkDir PathName
End Select



The key is that in order for the Rmdir to work, there cannot be any files OR directories within the directory you are Rmdir'ing. The On Error Resume Next is to handle the case where there are not any files for the Kill command to delete. This is working fine now. Thanks for the help.

mdmackillop
03-14-2006, 11:00 AM
Having removed all the files from the folder, I don't follow the logic of deleting it only to recreate it.

presence76
03-14-2006, 02:21 PM
Yes, you make a valid point. However, this folder gets filled with approximately 1,500 word documents a day. The application runs in an access DB so I have multiple MS apps running at the same time. To be completely honest, I don't trust the MS environment and what it does with temporary files and feel better wiping out the entire directory and then starting fresh every day.

fumei
03-15-2006, 07:29 PM
Don't trust the MS environment???????? I am shocked!

:jawdown: