PDA

View Full Version : VBA code to open folders and pdf file inside the folders



rockybalboa
07-25-2013, 02:52 AM
Hi,

I have a list of ids in excel sheet like 1234, 4567, 7890.
I have various folder created whose names ends with these ids ( ***_1234, ***_4567).
Inside these folders i have pdf file ending with dates (*** 23 Apr 2013, *** 20 Jun 2013).

I wanted a code which will match the ids in excel file and open the respective folders ending with those ids. After opening the folders it should open the latest PDF file using date.

Please let me know whether this is possible or not. If not also suggest some other way to do this.

Thanks..

SamT
07-25-2013, 05:21 AM
Hey Sly,

Welcome to VBAExpress.

Are the folder names identical except for the ID suffix?

In each folder are the file names identical except for the Date suffix?

rockybalboa
07-25-2013, 05:38 AM
Folder names are actually company names along with their id. Hence they are different.
Also the pdf file inside each folders are company name along with date

SamT
07-25-2013, 06:02 AM
You will need a cross reference from ID to Company name. I suggest a new sheet, named like vba_ID2Name or xl_ID2Name, (something to indicate it's special) with one column of IDs and one of Company names as they appear in the folder names.

If the Company names of the PDF files are not, in every case, identical to the folder company names, add a third column for PDF company name prefixes.

I recommend the PDF prefix column anyway, just in case things change in the future. It is much easier to copy one column now, than to rewrite the code later.

Final questions. (yeah, right :))
Are the folders on a network drive?
Is that drive mapped to/on your computer?
Do you have to log in to access that drive?
What is the full path to a folder on the drive.
Example:
ServerName & "E:\Daily Reports\" & YYYY & "\" & CompanyName & "_" & ID

Where YYYY = current year.

rockybalboa
07-25-2013, 06:19 AM
Yes the folders are on network drive. It is mapped to my pc. Yes i have to login to the drive.

N:\EMEA\1. Service\RMBS\2.Reviewe

Yes PDF company names are identical to folder company names, it just includes dates in the end.

SamT
07-25-2013, 06:46 AM
One last question :rotlaugh:

Is there a list anywhere on the network, but best if on your computer, that shows the dates of the PDF's generated by company? Somehow we will have to sort them by date. If there is no existing list, we'll have to retrieve all the file names in a folder, sort them, then open the latest. File access is pretty slow, and if there's lots of files, it means lots of slowness.

BTW, I have to leave until tomorrow, but someone else will be by soon to help you.

ps: they too will need the answers.

SamT
07-25-2013, 06:50 AM
BTW, I have to leave until tomorrow, but someone else will be by soon to help you.

rockybalboa
07-25-2013, 06:59 AM
No there is no such list. Currently each folder has only one pdf file but going forward there will be many.
To make it easier i can rename all the folders to just company name and skip the id part.
Thanks buddy

snb
07-26-2013, 02:33 AM
crossposted: http://www.excelfox.com/forum/f2/vba-code-to-open-folders-and-pdf-file-inside-the-folders-1276/

Kenneth Hobs
07-26-2013, 02:53 AM
So, you don't really want to open nor copy all the files. What you want is to make one PDF file with all the others inserted or merged as most call it. It is not that difficult but does involve several steps. Obviously, a third party program like pdfsam would be used to do the merging once you have the subfolder/filenames.

1. What 3rd party program are you using to merge the pdf files?
2. Post a short example workbook with the ids.
3. For the folders with the id suffix names, are these just subfolders of one parent folder? e.g. c:\pdf\x_1234, c:\pdf\y_4567, and not c:\pdf\x_1234, c:\pdf\level2\y_4567.

There are several components to your request which causes delays in getting an answer. Getting the suffix for each file in the subfolders will be one part. One then converts it to real date, not a date string. I would then poke the dates into an array and sort it. Most people ask for just the lastest file so parsing out the date string, is an extra step.

rockybalboa
07-26-2013, 03:18 AM
I skipped the ID part and now folders are named by just company names.
eg: I have to now search the below mentioned folder names in this path C:\Users\rockyb\Downloads\New folder


Names


Southern Pacific 05-3 PLC


Mortgage Funding 2008-1


Southern Pacific 04-A PLC


Preferred Residential 06-1



All the folders are under New Folder. Inside every folder we have PDFs with dates (Southern Pacific 04-A PLC 23 Apr 2013, Preferred Residential 06-1 20 Jun 2013)

Yes i will be using pdf split and merge.

Let me know if you need any thing else

Kenneth Hobs
07-26-2013, 06:45 AM
Since you are merging these into one pdf file will any folder order meet your needs? I guess one could sort by the suffix filename dates but with different folders, you may have some with the same date suffix.

Without knowing the program, all I can really give you is the string for the drive:\folders\filename.pdf for each folder in New Folder.
e.g.
Files are:
1. C:\Users\rockyb\Downloads\New folder\Folder1\Southern Pacific 04-A PLC 23 Apr 2013.pdf
2. C:\Users\rockyb\Downloads\New folder\Folder1\Southern Pacific 04-A PLC 20 Apr 2013.pdf
3. C:\Users\rockyb\Downloads\New folder\Folder2\Southern Pacific 04-A PLC 19 Apr 2013.pdf

The string that you would need with the encapsulated quotes would be:
"C:\Users\rockyb\Downloads\New folder\Folder1\Southern Pacific 04-A PLC 23 Apr 2013.pdf" "C:\Users\rockyb\Downloads\New folder\Folder2\Southern Pacific 04-A PLC 19 Apr 2013.pdf"

As you can see, the string would be very long if you have several subfolders with pdf files in them though you would be getting only one pdf filename per subfolder at most.

That string would go in my pdfsam example as I showed here. http://vbaexpress.com/forum/showthread.php?p=180767

rockybalboa
07-26-2013, 08:00 AM
i just want the code to select only those folders which i have listed in the excel sheet and then get its latest pdf file. Folder order is not necessary. Folder names inside new folder are also Company names i.e Southern Pacific 05-3 PLC, Mortgage Funding 2008-1.

Let me know if you need any thing else

Thanks

SamT
07-26-2013, 09:00 AM
Rocky,

It sounds like you have good control over the source program of the PDFs; ie, you can decide where it will save them and what it will name them.

It might make more sense to start our coding at that point.

Why don't you tell us the flow of steps that take place in the creation and saving of the PDFs and the final result you desire. This piecemeal explanations are a PITA, because each new piece of information requires rewriting all the code that was based on the previous "meals.".

Start from the beginning and don't leave anything out.
Example:
I receive data in the for of a X type file
I use Y program to convert it to a PDF
I tell Y to save it in Z folder with the name A. Name A comes from B. I can save it in any folder I want
Now that I have the PDFs in folder Y, I want to create a PDF that includes C and D and E.
I want to save/delete all the old/used PDFs in folder F

rockybalboa
07-26-2013, 09:51 AM
The pdf are actually created from a excel file report of the company and are saved in its respective folders with current date.
Now what i wanted to do is

For eg: under New Folder we have 50 company folders. Each folder has pdf files with dates in the end.
Now i have to create a pdf book of 10 companies. I will list down those company names in a excel.
Now the macro should identify those 10 company folders, get the latest pdf files based on date from each folder and finally merge it to one.
The macro should not delete the original pdf files from the

Thanks mate for your time on this

Kenneth Hobs
07-26-2013, 11:32 AM
Here is step 1.

Sub Test_FilenamesSuffixDate()
Dim d As Date
d = FilenamesSuffixDate("c:\ken\pdf\Southern Pacific 04-A PLC 23 Apr 2013.pdf")
MsgBox d & vbLf & Format(d, "dd mmm yyyy")
End Sub

Function FilenamesSuffixDate(fullPathFilename As String) As Date
Dim s As String, a() As String, ub As Integer
's = "Southern Pacific 04-A PLC 23 Apr 2013.pdf"
s = fullPathFilename
s = Left(s, Len(s) - 4) 'Trim file extenstion
a() = Split(s, " ")
ub = UBound(a)
If ub < 2 Then Exit Function
FilenamesSuffixDate = DateValue(Join(Array(a(ub - 2), a(ub - 1), a(ub)), " "))
End Function

Kenneth Hobs
07-26-2013, 02:25 PM
Here is the solution. You can tweak it from here. Put your parent folder path in A2. Put your subfolder names in A3 and down. Run the macro and then look at the filenames in the Immediate Window.


Sub Test_aPDFsInSubfolders()
Dim a() As Variant, v As Variant
a() = aPDFsInSubfolders(Range("A2"), Range("A3", Range("A3").End(xlDown)))
For Each v In a()
Debug.Print v
Next v
End Sub

Function aPDFsInSubfolders(pFolder As String, rSubfolders As Range) As Variant
Dim c As Range, a As Variant, v As Variant, fn As String, lfn As String
Dim s As String, sa() As String, i As Integer
Dim d As Date, lfnd As Date, fnd As Date
Dim aa() As Variant

' Iterate cell range values for subfolders
i = 0
For Each c In rSubfolders
'Debug.Print Dir(Join(Array(pFolder, c), "\"), vbDirectory) '="" if folder does not exist
If Dir(Join(Array(pFolder, c), "\"), vbDirectory) = "" Then GoTo NextC
a = GetFileList(Join(Array(pFolder, c, "*.pdf"), "\"))
If Not IsArray(a) Then GoTo NextC
' Iterate filenames and find newest suffix pdf file's name
lfn = ""
lfnd = 0
For Each v In a
fn = Join(Array(pFolder, c, v), "\")
fnd = FilenamesSuffixDate(fn)
If fnd > lfnd Then
lfnd = fnd
lfn = fn
End If
Next v
If lfn <> "" Then
i = i + 1
ReDim Preserve aa(1 To i)
aa(i) = lfn
End If
NextC:
Next c
aPDFsInSubfolders = aa()
End Function

Sub Test_FilenamesSuffixDate()
Dim d As Date
d = FilenamesSuffixDate("c:\ken\pdf\Southern Pacific 04-A PLC 23 Apr 2013.pdf")
MsgBox d & vbLf & Format(d, "dd mmm yyyy") & vbLf & CLng(d) 'cLng(d)=0 if error, not a date
End Sub

Function FilenamesSuffixDate(fullPathFilename As String) As Date
Dim s As String, a() As String, ub As Integer
's = "Southern Pacific 04-A PLC 23 Apr 2013.pdf"
s = fullPathFilename
If LCase(Right(s, 3)) = "pdf" Then s = Left(s, Len(s) - 4) 'Trim file extenstion
a() = Split(s, " ")
ub = UBound(a)
If ub < 2 Then Exit Function
On Error Resume Next 'If not a date, returns 0 value for date
FilenamesSuffixDate = DateValue(Join(Array(a(ub - 2), a(ub - 1), a(ub)), " "))
End Function

Function GetFileList(FileSpec As String) As Variant
' Returns an array of filenames that match FileSpec
' If no matching files are found, it returns False

Dim FileArray() As Variant
Dim FileCount As Integer
Dim FileName As String

On Error GoTo NoFilesFound

FileCount = 0
FileName = Dir(FileSpec)
If FileName = "" Then GoTo NoFilesFound

' Loop until no more matching files are found
Do While FileName <> ""
FileCount = FileCount + 1
ReDim Preserve FileArray(1 To FileCount)
FileArray(FileCount) = FileName
FileName = Dir()
Loop
GetFileList = FileArray
Exit Function

' Error handler
NoFilesFound:
GetFileList = False
End Function

rockybalboa
07-26-2013, 07:46 PM
Thanks for this but there are some issues i guess. i doesnt move to the next company name listed in the excel.
Can we tweak the code in such a way that it should copy all the latest pdf files of the companies which i have listed in the excel to some new folder. So that then we can merge it to one.
Parent Folder path: C:\Users\rockyb\Downloads\New folder
Inside New folder we have all the company folders
eg: C:\Users\rockyb\Downloads\New folder\Mortgage Funding 2008-1
C:\Users\rockyb\Downloads\New folder\Preferred Residential 06-1
and so on..
Inside company folders we have pdfs
eg: C:\Users\rockyb\Downloads\New folder\Mortgage Funding 2008-1\Mortgage Funding 2008-1 19 Jun 2013.pdf
C:\Users\rockyb\Downloads\New folder\Mortgage Funding 2008-1\Mortgage Funding 2008-1 23 Apr 2013.pdf

So now if i will list down "Mortgage Funding 2008-1" name in my excel sheet (suppose in "A2"), it should go to Mortgage Funding 2008-1 folder, get the latest pdf file "Mortgage Funding 2008-1 23 Apr 2013.pdf" and copy it to a new folder.

Thank you

snb
07-27-2013, 02:59 AM
Can we tweak the code in such a way

Yes, you can ....

rockybalboa
07-27-2013, 03:04 AM
Haha. Srry Can you...

Kenneth Hobs
07-27-2013, 08:13 AM
You keep changing what you want so it is hard to help you.

To copy the files to a new folder requires two lines of code. To copy to an existing folder requires one line of code. I don't see the purpose in that though. You would then still need to get the full path and filenames to build the string for input to the pdf merge program.

You will find that about 95% of my solutions are tested and work before I post them. You did not include an Excel file as I requested so I don't know what you did. I have attached my file so that you can see what I did. Obviously, the paths, drives and such in Column A must be changed to fit your setup.

rockybalboa
07-27-2013, 10:26 PM
Thank you, this is great.
Sorry for the trouble, yes i want to copy all the selected pdf files to a new folder.
Will you please edit the code in such a way that
In Range ("E1") i will give the path to parent folder
From Range ('A2") and down the company names will come
and in Range ("E3") i will give the path to where the pdf files needs to be copied.

Your help is appreciated in this. Thank you

Kenneth Hobs
07-28-2013, 11:43 AM
If the filenames are duplicated, then each one will be copied over the previous.


Sub Test_aPDFsInSubfolders()
Dim a() As Variant, v As Variant, i As Integer
Dim fCopyTo As String
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")

fCopyTo = Range("E3").Value2
If Dir(fCopyTo, vbDirectory) = "" Then Exit Sub

a() = aPDFsInSubfolders(Range("E1").Value2, Range("A2", Range("A2").End(xlDown)))
For Each v In a()
FileCopy v, fCopyTo & "\" & fso.GetBaseName(v) & ".pdf"
Next v

Set fso = Nothing
End Sub

Function aPDFsInSubfolders(pFolder As String, rSubfolders As Range) As Variant
Dim c As Range, a As Variant, v As Variant, fn As String, lfn As String
Dim s As String, sa() As String, i As Integer
Dim d As Date, lfnd As Date, fnd As Date
Dim aa() As Variant

' Iterate cell range values for subfolders
i = 0
For Each c In rSubfolders
'Debug.Print Dir(Join(Array(pFolder, c), "\"), vbDirectory) '="" if folder does not exist
If Dir(Join(Array(pFolder, c), "\"), vbDirectory) = "" Then GoTo NextC
a = GetFileList(Join(Array(pFolder, c, "*.pdf"), "\"))
If Not IsArray(a) Then GoTo NextC
' Iterate filenames and find newest suffix pdf file's name
lfn = ""
lfnd = 0
For Each v In a
fn = Join(Array(pFolder, c, v), "\")
fnd = FilenamesSuffixDate(fn)
If fnd > lfnd Then
lfnd = fnd
lfn = fn
End If
Next v
If lfn <> "" Then
i = i + 1
ReDim Preserve aa(1 To i)
aa(i) = lfn
End If
NextC:
Next c
aPDFsInSubfolders = aa()
End Function

Sub Test_FilenamesSuffixDate()
Dim d As Date
d = FilenamesSuffixDate("c:\ken\pdf\Southern Pacific 04-A PLC 23 Apr 2013.pdf")
MsgBox d & vbLf & Format(d, "dd mmm yyyy") & vbLf & CLng(d) 'cLng(d)=0 if error, not a date
End Sub

Function FilenamesSuffixDate(fullPathFilename As String) As Date
Dim s As String, a() As String, ub As Integer
's = "Southern Pacific 04-A PLC 23 Apr 2013.pdf"
s = fullPathFilename
If LCase(Right(s, 3)) = "pdf" Then s = Left(s, Len(s) - 4) 'Trim file extenstion
a() = Split(s, " ")
ub = UBound(a)
If ub < 2 Then Exit Function
On Error Resume Next 'If not a date, returns 0 value for date
FilenamesSuffixDate = DateValue(Join(Array(a(ub - 2), a(ub - 1), a(ub)), " "))
End Function

Function GetFileList(FileSpec As String) As Variant
' Returns an array of filenames that match FileSpec
' If no matching files are found, it returns False

Dim FileArray() As Variant
Dim FileCount As Integer
Dim FileName As String

On Error GoTo NoFilesFound

FileCount = 0
FileName = Dir(FileSpec)
If FileName = "" Then GoTo NoFilesFound

' Loop until no more matching files are found
Do While FileName <> ""
FileCount = FileCount + 1
ReDim Preserve FileArray(1 To FileCount)
FileArray(FileCount) = FileName
FileName = Dir()
Loop
GetFileList = FileArray
Exit Function

' Error handler
NoFilesFound:
GetFileList = False
End Function

snb
07-28-2013, 12:56 PM
@KH

Did you consider ?


Sub M_snb()
shell "cmd /c xcopy G:\OF\*.pdf G:\ /s" , 0
End Sub

Kenneth Hobs
07-28-2013, 03:22 PM
No. Obviously, I know how to do those simply things since I learned DOS long ago.

The op wanted just the PDF files in subfolders. He wanted one file per folder. That file selection was to be based on a suffix file naming convention using a date format of "dd mmm yyyy" or "d mmm yyyy". Had the user set the file's creation or modification date criterion, more simply solutions use xcopy options would have sufficed. Even then copying the files is not what was really needed.

The goal was to put all the PDF files meeting the criterion into one PDF file. Once you have the correct folder with filenames, as I posted, one puts that as input into 3rd party programs to merge like adobe acrobat, pdffill, pdftk, and pdfsam to name a few.

Now that the files are in one folder, one has to get those paths and filenames for the merge. That is easier now but putting the files into a folder was never really needed.

rockybalboa
07-28-2013, 08:24 PM
This is simply awesome sir. Thank you very much. Appreciated.

snb
07-30-2013, 05:51 AM
I assume (maybe incorrectly)

1 there's a 'parent' folder containing only subfolders and not containing any pdf file
2 each subfolder contains only 1 pdf file; it's name matching the subfolder's name, complemented with a date

I found no indication that these assumptions are invalid.

If that is the case then all pdf-files in subdirectories can be copied to 1 folder (like the OP asked in #22) using the xcopy command.
As long as the OP thinks the PDF-merging program needs all files to be in 1 folder, this might be helpful.

Kenneth Hobs
07-30-2013, 06:39 AM
I found no indication that these assumptions are invalid.

See post #1.

rockybalboa
09-06-2013, 05:18 AM
Hi, Can you please help me to add one more condition in the macro.

Condition is, incase there are two PDF files of same date in a folder, then the macro should select the one which mentions "updated" in the filename just before date. (Adriati Updated 24 Aug 2013.pdf)

Please let me know incase of any queries.

Thank you.

rockybalboa
09-16-2014, 10:32 PM
@Kenneth Hobs (http://www.vbaexpress.com/forum/member.php?3661-Kenneth-Hobs) Can you please help me with the macro file you created "SuffixDateFromEndOfFile.xlsm (http://www.vbaexpress.com/forum/attachment.php?attachmentid=10338&d=1375036974)" attached above. Just a small change, instead of identifying and copying PDF files , i want it to copy .xlsm file, rest all is same.

I tried to do it by changing
FileCopy v, fCopyTo & "\" & fso.GetBaseName(v) & ".pdf" to
FileCopy v, fCopyTo & "\" & fso.GetBaseName(v) & ".xlsm" and it identified and copied the file as well. But now i can not open the copied file. It gives error saying "Cannot open because file format or extension is not valid".

Thanks :)

Kenneth Hobs
09-17-2014, 12:12 PM
My guess is that v is not an XLSM file. Before the FileCopy Line, do this:

Debug.Print v,fCopyTo & "\" & fso.GetBaseName(v) & ".xlsm"

After running the code, and in debug mode, check for the output in the Immediate window.

rockybalboa
09-17-2014, 11:00 PM
It copied all the .xlsm files correctly to a new folder but was unable to open it.

May be this code has something to do with where it is mentioned *.pdf


For Each c In rSubfolders 'Debug.Print Dir(Join(Array(pFolder, c), "\"), vbDirectory) '="" if folder does not exist
If Dir(Join(Array(pFolder, c), "\"), vbDirectory) = "" Then GoTo NextC
a = GetFileList(Join(Array(pFolder, c, "*.pdf"), "\"))
If Not IsArray(a) Then GoTo NextC

Kenneth Hobs
09-18-2014, 05:51 AM
That code populates the array a() with the filenames with file extensions of pdf.