Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 21

Thread: Download files from URL, unknown file names

  1. #1
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location

    Download files from URL, unknown file names

    [COLOR=rgba(0, 0, 0, 0.8)]Hello All,[/COLOR]

    [COLOR=rgba(0, 0, 0, 0.8)]I have a list of URLs that each one opens a webpage were an undetermined number of files (.pdf) are stored, I cannot anticipate the names of the files, so I need a macro to go through each of those pages and automatically download all the files present there, save them in a local folder.[/COLOR]

    [COLOR=rgba(0, 0, 0, 0.8)]I had found solutions when the name of the file is known, but this is not my case, I hope somebody can help.[/COLOR]

    [COLOR=rgba(0, 0, 0, 0.8)]Warmest regards![/COLOR]

    [COLOR=rgba(0, 0, 0, 0.8)]Pegaso[/COLOR]

  2. #2
    VBAX Mentor
    Joined
    Feb 2015
    Posts
    395
    Location
    Cross Posted here

    Hi Pegaso
    As this is your first post you may not realise that cross-posting is frowned upon.

    from the site FAQ
    Multiposting is the act of posting a single question to different forums around the same time.

    We discourage multiposting because it is like calling five cab companies and going with the one that comes first -- it shows disrespect for the volunteers that monitor this forum because they will unknowingly be helping someone who may have already received help elsewhere. Their time is no less important than that of the person asking the question.

    Many of the volunteers here visit multiple forums and can easily spot duplicate posts. Some of them may even reply to your posts with a link to your post on another forum. Don't be the person that gets caught.

    If you must post your question on a different forum, include a link to the question you have already posted on the previous forum(s). That way, those helping you can decide for themselves if you are already receiving the help you need somewhere else.

  3. #3
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Sorry about crossposting, below is a link to other site where I posted the same question:

    Each time I try to post the link to the other forum I get the message:

    Capture.JPG

    For the time being I do not have an answer, but just to clarify, I cannot provide the page I'm downloading the data from because it is password protected, I can just say that each URL directs to a plain page with a list of documents and that I need the macro to open each one of those and store them in a local folder in my computer.

    Any help will be highly appreciated!

    Pegaso

  4. #4
    VBAX Mentor
    Joined
    Feb 2015
    Posts
    395
    Location
    You are very naughty!
    You posted here too and someone there started helping you almost immediately.

  5. #5
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Yongle,

    As I said before, each time I was trying to post a link the system didn't allow me to do it, I get the "Post denied" message. I'm not familiar posting stuff in forums, so I'm probably doing something wrong.

    Again apologies for crossfiring this stuffacross different sites, it was not my intention to bother anyone, I was justtrying to get the answer I need in a hurry.

    None of the answers in the other post had beenuseful so far, I don't know how to automate IE or use xmlhttp

    If you have any useful suggestion(s) you want to share, please do it.

    Warmest regards.

  6. #6
    VBAX Mentor
    Joined
    Feb 2015
    Posts
    395
    Location
    Not my area of expertise, but to help anyone in the forum who may know how to do this:
    1 Post the full VBA that works for the single PDF
    2 Please put codetags around the code (paste your code, select it, then click on # above)
    3 Do not delete any lines, but, if confidentiality is an issue, just use a few XXX's to disguise the web address etc
    4 Put a comment on the other threads that you have started saying that the problem is being looked at elsewhere

    thanks and - I should have said this earlier - welcome to the forum

  7. #7
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Quote Originally Posted by Yongle View Post
    Not my area of expertise, but to help anyone in the forum who may know how to do this:
    1 Post the full VBA that works for the single PDF
    2 Please put codetags around the code (paste your code, select it, then click on # above)
    3 Do not delete any lines, but, if confidentiality is an issue, just use a few XXX's to disguise the web address etc
    4 Put a comment on the other threads that you have started saying that the problem is being looked at elsewhere

    thanks and - I should have said this earlier - welcome to the forum

    Here you go:

    This is posted in (I cannot load links, doesn't allow me), check post 138522 in chandoo.org

    I found the following thread in other famous excel-VBA user forum: Mr Excel 353006

    Three codes are proposed there, the second one would cover my needs:

    Sub Test2()
    Dim i As Long
    Dim FileNum As Long
    Dim FileData() As Byte
    Dim MyFile As String
    Dim WHTTP As Object

    On Error Resume Next
    Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5")
    If Err.Number <> 0 Then
    Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
    End If
    On Error GoTo 0

    If Dir("C:\MyDownloads", vbDirectory) = Empty Then MkDir "C:\MyDownloads"

    For i = 1 To 10
    MyFile = Cells(i, 1).Text
    TempFile = Right(MyFile, InStr(1, StrReverse(MyFile), "/") - 1)
    WHTTP.Open "GET", MyFile, False
    WHTTP.Send
    FileData = WHTTP.ResponseBody

    FileNum = FreeFile
    Open "C:\MyDownloads\" & TempFile For Binary Access Write As #FileNum
    Put #FileNum, 1, FileData
    Close #FileNum
    Next
    Set WHTTP = Nothing
    MsgBox "Open the folder [ C:\MyDownloads ] for the downloaded file..."
    End Sub

    The info I need to download is password-protected, so I cannot give you the login info for you to take a look, but below is a screen-capture. As you can see, each URL directs the user to a "vault" with few files in there (usually .pdf but also .doc or .gif). What I want is basically to replicate the content of each vault in my local HD. The code above doesn't work for my case, even if you input the file's location (you can know it after you click in the link, the little folder to the right), but since the files come in different formats, the dialog to save them is not always available (Just with the .pdf files seems to work fine), for the rest it would be required to select the file (ctrl+A), copy & paste it to a new document, and save it.

    It is probably more complicate that I thought at the beginning, also I need to say that at the same time I posted the same question in other forums, (with no success by the way), should I include links to those here?

    By the way, I just found another possible path:
    exceltrainingvideos & automate-scraping-of-hyperlinks-from-web-pages-using-excel-vba

    If I can make this work, I should be able to get the links to the documents in each URL, then the above mentioned solution should work

    I'll try it and post the results!

    Regards!

    Pegaso

    Capture3.JPG

  8. #8
    VBAX Mentor
    Joined
    Feb 2015
    Posts
    395
    Location
    I would be surprised if it does not allow you to use the codetags. It's next to the "quote" icon and works the same.
    If it does not work then try putting the word CODE inside square brackets [xxxx] before the first line of your code and /CODE also inside square brackets [/xxxx] after the End Sub
    (I cannot type it in - it will treat my words as code!)
    Please try it again - it makes it easier if it looks like code.

    And to make it totally clear for everyone, can you please confirm that you have made this specific code work for you already as posted, but only for a single pdf , when the name of that pdf was already known.

  9. #9
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location

    Download files from URL - Code & clarifications

    Here it goes the code in brackets, but before I want to clarify this:

    I DIDN'T WORK THIS CODE, IT WAS AVAILABLE IN ANOTHER FORUM, Mr. EXCEL 353006, THANKS TO THE AUTHORS FOR PROVIDING IT

    THE CODE WORKED ON A DIFFERENT URL THAN THE ONE THAT IT IS INTENDED TO WORK NOW, AS LONG AS THE COMPLETE URL OF THE FILE IS KNOWN (WHICH IS NOT MY CASE)

    I may had found a solution to get all the links displayed in a URL, but still testing, if I make that work, I should be able to use the code below.
    Sub Test2()
    Dim i As Long
    Dim FileNum As Long
    Dim FileData() As Byte
    Dim MyFile As String
    Dim WHTTP As Object
    
    On Error Resume Next
    Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5")
    If Err.Number <> 0 Then
    Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
    End If
    On Error GoTo 0
    
    If Dir("C:\MyDownloads", vbDirectory) = Empty Then MkDir "C:\MyDownloads"
    
    For i = 1 To 10
    MyFile = Cells(i, 1).Text
    TempFile = Right(MyFile, InStr(1, StrReverse(MyFile), "/") - 1)
    WHTTP.Open "GET", MyFile, False
    WHTTP.Send
    FileData = WHTTP.ResponseBody
    
    FileNum = FreeFile
    Open "C:\MyDownloads\" & TempFile For Binary Access Write As #FileNum
    Put #FileNum, 1, FileData
    Close #FileNum
    Next
    Set WHTTP = Nothing
    MsgBox "Open the folder [ C:\MyDownloads ] for the downloaded file..."
    End Sub

  10. #10
    VBAX Mentor
    Joined
    Feb 2015
    Posts
    395
    Location
    Pegaso
    I couldn't stand looking at all those color tags. So I cleaned it up. That was not totally down to you
    There should be a warning to avoid this happening - (I have made a note to find out where to send request for warning)

    In future either add your code directly out of VBA or else right click and "paste as plain text", which strips out all the garbage.
    Anyway, it now seems to me that we have gone to all this trouble to post code that has never worked for you.
    At least it looks the way it should, even if you are no further forward. I am confused as to where to go from here - perhaps someone will come to the rescue!

    Sub Test2()
    Dim i As Long
    Dim FileNum As Long
    Dim FileData() As Byte
    Dim MyFile As String
    Dim WHTTP As Object
    
    
    On Error Resume Next
    Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5")
    If Err.Number <> 0 Then
    Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
    End If
    On Error GoTo 0
    
    
    If Dir("C:\MyDownloads", vbDirectory) = Empty Then MkDir "C:\MyDownloads"
    
    
    For i = 1 To 10
    MyFile = Cells(i, 1).Text
    TempFile = Right(MyFile, InStr(1, StrReverse(MyFile), "/") - 1)
    WHTTP.Open "GET", MyFile, False
    WHTTP.Send
    FileData = WHTTP.ResponseBody
    
    
    FileNum = FreeFile
    Open "C:\MyDownloads\" & TempFile For Binary Access Write As #FileNum
    Put #FileNum, 1, FileData
    Close #FileNum
    Next
    Set WHTTP = Nothing
    MsgBox "Open the folder [ C:\MyDownloads ] for the downloaded file..."
    End Sub

  11. #11
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Hi Yongle,

    Just an update: making (some!) progress!

    I was able to get the URL for the documents I need to download, using the "view source" option of the "vault" page I was able to get the 6 digits code that defines (together with a long, predictable path) the URL of the file to download, then I extracted those codes using text functions of excel, then together with the rest of URL, I was able to create the URLs for all the files in each vault. Now I need to do that automatically using a macro, but that I think I can handle (you'll see my post tomorrow if not!).

    The next part should be quite easy: make excel to open each URL and save it as .pdf, either by printing through "PDF Creator" for .doc or .jpg files or simply by saving for the .pdf files. The path and file name will come from the same table in excel where the URLs for each file are stored.

    To my surprise, I couldn't find a way to print or save the files after open them. Those were easy to open, but when it comes to either print or save them, I couldn't find a proper way to do it.

    This is what I have so far:

    Sub VisitWebsite()
    
    
    Const OLECMDID_PRINT = 6
    Const OLECMDEXECOPT_DONTPROMPTUSER = 1
    Const PRINT_WAITFORCOMPLETION = 2
    
    
    Dim IE As Object
    Set IE = CreateObject("INTERNETEXPLORER.APPLICATION")
    IE.Navigate google (this should be the full link, just edited to be able to post!)
    IE.Visible = True
    
    
    While IE.Busy
    DoEvents
    Wend
    
    
    objIE.ExecWB OLECMDID_PRINT, OLECMDEXECOPT_DONTPROMPTUSER
    
    
    End Sub
    The last line generates an error: 'Run-time error 424: Object required"

    On top of the above I couldn't find the way to set file name and folder to save the .pdf

    LAter I found a simple solution (I'll post the proper credits later, I don't from which of the 1000 pages open I got it!)

    Option Explicit
     
    Private Declare Function URLDownloadToFile Lib "urlmon" _
    Alias "URLDownloadToFileA" ( _
    ByVal pCaller As Long, _
    ByVal szURL As String, _
    ByVal szFileName As String, _
    ByVal dwReserved As Long, _
    ByVal lpfnCB As Long _
    ) As Long
     
    Sub z()
         
        Dim strSource As String
        Dim strDest As String
        strSource = another link
        strDest = another local link
        URLDownloadToFile 0, strSource, strDest, 0, 0
         
    End Sub
    But here the problem is the incompatibility between 32 and 64 bits, this macro doesn't work in 64 bits!, is there a general solution to make a 32 bits code run in 64? Or each code has to be fixed individually to be able to work?

    Can you or someone else here help me with those?

  12. #12
    VBAX Mentor
    Joined
    Feb 2015
    Posts
    395
    Location
    But here the problem is the incompatibility between 32 and 64 bits, this macro doesn't work in 64 bits!, is there a general solution to make a 32 bits code run in 64? Or each code has to be fixed individually to be able to work?
    You should get the answer to this here
    The code you posted above is verbatim the answer to the a question of how to convert between the two, so the answer to your question must be the original code.

  13. #13
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Yongle,

    The 32 to 64 bits problem is solved, I grab a 32 bits machine.

    My code needs to:

    1. Open an specific webpage in IE (check!)
    2. Print the content to PDFCreator, save it with an specific name in an specific folder, without asking the user (Need help here!)
    3. Loop above for a while (piece of cake!)

    What's driving me nuts is how to control the PDF printer (set by default to make life a little less complicated).

    This is the code I have so far..

    Sub OpenPrintWebPage()
    Dim ie As Object
    Dim oApp As Object
    Dim FileName As String
    Dim FolderLoc As String
    Dim URL As String
    Dim i As Integer
    Dim Max As Integer
     
    Set ie = CreateObject("InternetExplorer.Application")
    
    
    Sheets("URLs").Activate 'The one that contains the URLs in column 1, Folder in 2 and Name to save the .pdf in 3
    
    
    Max = Cells(1, 4) 'Number of URLs to work 
    
    
    For i = 1 To Max
    
    
        URL = Cells(i + 1, 1)
        FolderLoc = Cells(i + 1, 2)
        FileName = Cells(i + 1, 3)
    
    
        ie.Visible = True
        ie.Navigate URL
        
        Do Until ie.ReadyState = 4
            DoEvents
        Loop
        
        'Printing this is the part that doesn't work
           
        ie.ExecWB 17, 2
        ie.ExecWB 12, 2
    
    
    Next i
    
    
    End Sub
    The code runs, but the printed file location is unknown, actually if I use a physical printer it doesn't work either.

    I had been searching this topic the whole morning, and trying different options mentioned in other posts but nothing seems to work.

    As always your help (and the help of anyone who knows how to take me out of my misery), is highly appreciated!

    Pegaso

  14. #14
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    Your post #9 goes in 2 totally different directions. The IE method only selects all on a web page and then copies it to the clipboard. Once can then paste that to Excel. I don't see alot of value in that myself.

    Try providing a bit more detail like attaching a file with a few cells completed. If data is sensitive, obfuscate it first.

  15. #15
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Quote Originally Posted by Kenneth Hobs View Post
    Your post #9 goes in 2 totally different directions. The IE method only selects all on a web page and then copies it to the clipboard. Once can then paste that to Excel. I don't see alot of value in that myself.

    Try providing a bit more detail like attaching a file with a few cells completed. If data is sensitive, obfuscate it first.
    Hi Kenneth, thanks for your quick reply

    This is a work in progress, so I had been trying different techniques.

    I really don't know what can I post to make it more clear...

    I have a code that opens IE as an object and navigates to a webpage, that shows a document in the screen (usually .pdf). Once open, I need to print that document in IE to .pdf (PDF Creator is what I have), or save it as .pdf, based on a path & name that I provided using variables, without prompting the user the usual print selection form.

    Finally I need to loop through >10000 files

    And that's all what I need.

    If it makes it more clear just follow my posts from today.

    Warmest regards and I hope you can help me.

    By the way, I'm asking also in OZGRID, post743495

    Warmest regards.

  16. #16
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    Telling us how you want the problem solved, can lead to no solution. Well defining a problem can help us not take a wrong or slow solution's approach.

    You must be new to forums. Click the Go Advanced button in the bottom right of a reply. You can then click the paperclip icon to browse and select a file to attach. Most forums work this way with a few exceptions.

    When you ran that other code to select and copy the selections to the clipboard, did the paste to an Excel sheet produce what you wanted?

    IF you don't want to post the short workbook as I requested, then post the URL so we can test it. What I can get from that is is it a file you want or scrub the URL site. Some URLs point just to files for download.

    Doing a 1,000 URLs the ie object method is bound to be extremely slow and prone to problems.

    The thread that you referenced does not exist and a search at ozgrid did not find it. After your 5th post I believe, you can simply paste the URL to create the hyperlink.

  17. #17
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Quote Originally Posted by Kenneth Hobs View Post
    Telling us how you want the problem solved, can lead to no solution. Well defining a problem can help us not take a wrong or slow solution's approach.

    You must be new to forums. Click the Go Advanced button in the bottom right of a reply. You can then click the paperclip icon to browse and select a file to attach. Most forums work this way with a few exceptions.

    When you ran that other code to select and copy the selections to the clipboard, did the paste to an Excel sheet produce what you wanted?

    IF you don't want to post the short workbook as I requested, then post the URL so we can test it. What I can get from that is is it a file you want or scrub the URL site. Some URLs point just to files for download.

    Doing a 1,000 URLs the ie object method is bound to be extremely slow and prone to problems.

    The thread that you referenced does not exist and a search at ozgrid did not find it. After your 5th post I believe, you can simply paste the URL to create the hyperlink.

  18. #18
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location
    Kenneth,

    I spent about an hour replying to your post and when I click the send button the system asked me to log in again and after that all the text was gone.

    I shoot a screen capture explaining what I need, I hope it helps, I'm too tired today to re-write everything, I placed a link to my dropbox folder in the attachments section (the file exceeded the allowed size)

    Thanks and good night.

    Pegaso

  19. #19
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    Can you just post a valid URL if it is not sensitive that shows what you need? I can then see if you wanted to download a file as your thread title says or just do a copy/paste of the selection. As I said, the latter can be done I guess if you like long breaks while it does it. I have attached an example file that does the ie method for this thread's URL. Run the mMain Test subs from the bottom up to see each step. It also has a pdf module with routines that can be used to save a sheet as a PDF file.

    I am kind of an odd ball I guess. I like to solve simple problems. To do that, I isolate the problem and solve it and then move on to the next one. That is done by making a short example file with just that problem. While I don't mind looking at somone's huge file, I sometimes can't see what their issue is because they have other issues that have to be resolved first.

    I have been bitten by that bug before. I tend to be detail oriented too much sometimes. When I know I am going to do something like that, I scribble my response into MSWord or such first. That way I can spell check and such. Even so, long responses and long threads tend to lose focus for the others that might want to help.
    Attached Files Attached Files

  20. #20
    VBAX Regular
    Joined
    Apr 2015
    Posts
    11
    Location

    Update

    Kenneth,

    After a few weeks of work, and with help of the gurus for this and other forums, I was able to download ~100k .pdf files from my database.

    Now I'm facing a similar challenge, but this time I need to deal with the non-pdf files in my database. The .pdf files are downloaded automatically (I had to change options in Adobe Acrobat to disable the .pdf support to IE), the rest of the files, I found no way to force IE to download them, (.txt, .doc, .jpg, .bmp, etc..). I found an alternative using HTML5 (see here), I was able to create .html files that when open, only display a link, when the link is clicked, the file starts to download. This is an improvement but since I'm not familiar with HTML5 I need help in making the code to do the following:

    1. Automatically click the link, so it will trigger the downloading of the file
    2. If possible, give a name to the file to download file, it is simply saved as ParseFile.htm, since I need to do this several thousand times, I would like the file to have a meaningful name.

    The following code creates the .html file with the required link and opens the file later on, but I still need to click the link and if possible give a meaningful name to the file, preferably before downloading it.

    #If VBA7 Then    Private Declare PtrSafe Function ShellExecute Lib "shell32.dll" Alias "ShellExecuteA" _
            (ByVal hwnd As Long, ByVal lpOperation As String, ByVal lpFile As String, ByVal lpParameters As String, ByVal lpDirectory As String, ByVal nShowCmd As Long) As Long
    #Else
        Private Declare Function ShellExecute Lib "shell32.dll" Alias "ShellExecuteA" _
            (ByVal hwnd As Long, ByVal lpOperation As String, ByVal lpFile As String, ByVal lpParameters As String, ByVal lpDirectory As String, ByVal nShowCmd As Long) As Long
    #End If
    
    
    Sub CreateHTML()
    
    
    Sheets("Summary-1").Activate
    
    
    VaultMin = Cells(2, 2)
    VaultMax = Cells(3, 2)
    FilePath = Cells(2, 5)
    
    
    TextA = """"
    Text1 = "<a href="
    Text2 = "http://xx.rrrrr.com/new_fgtfgtfgtASP/fileParse.php?attachID="
    Text4 = "&fileID="
    Text6 = " download="
    Text7 = "FileName"
    Text8 = ">File abc</a>"
    
    
    For t = VaultMin To VaultMax
        Text = ""
        If Cells(t, 5) <> ".pdf" Then
        
            Text3 = Cells(t, 6)
            Text5 = Cells(t, 7)
            Filename = "HTML-" & Text3 & "-" & Text5 & ".html"
            MyFile = FilePath & Filename
            Text = Text1 & TextA & Text2 & Text3 & Text4 & Text5 & TextA & Text6 & TextA & Text7 & TextA & Text8
            Cells(1, 1) = Text
            
            Set fs = CreateObject("Scripting.FileSystemObject")
            Set f = fs.CreateTextFile(MyFile, True)
        
            f.Write Chr(9) & Text 'strMyPage
            f.Close
            
            Call OpenAnyFile(MyFile)
            
        Else: End If
    
    
    Next t
    
    
    End Sub
    
    
    Function OpenAnyFile(FileToOpen As String)
         
        Call ShellExecute(0, "Open", FileToOpen & vbNullString, _
        vbNullString, vbNullString, 1)
         
    End Function


    By the way, this HTML5 script doesn't work in IE, just in Chrome, so I have the additional problem of controlling Chrome with VBA (with IE is well documented, but I couldn't find the same level of documentation for Chrome)


    Warmest regards and thanks in advance for all your help!

    Pegaso

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •