PDA

View Full Version : Importing tables from multiple HTML files, into one spreadsheet. Please Help



karldou
07-29-2008, 04:26 AM
Hi,
I'm new to VBA and Excel so please be gentle :(

I have been given the task to try and speed up one of our processes.
What i need to do is to create a macro, that will read multiple HTML files, copy the contents of one table from each file, and then paste the contents into one column on a spreadsheet. Then it will filter out the one cell with the data I need, and then paste this cell into another sheet.

The data would need to be read from the table in the first HTML file, then pasted into a column, say A. Then it would need to read the second HTML file, copy the data from the table and paste into column A again, but one row lower. It would need to loop this process until all the HTML files in the directory have been read.
Once this has been completed, it would then need to copy the contents of multiple seperate cells into a new worksheet... say copy cells B2, B6, B10, B14 etc (4 row gap).

I have played a bit with Macros and VBA, and i have read almost all relevant posts on this forum, which have been really informative, but i'm now starting to get rather confused!

Here are a few snippets of the code i have been playing with:

Importing the table from the first HTML file
Sub Import()
ActiveWorkbook.Worksheets.Add
With ActiveSheet.QueryTables.Add(Connection:= _
"FINDER;file:///C:/Documents%20and%20Settings/Administrator/Desktop/Forms/test-new/test/1.htm" _
, Destination:=Range("A1"))
.Name = "1"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "2"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub


Copying one of the imported cells and pasting into a seperate sheet
' Copies Leave Date from E-Form and pastes into "input_file.xls"
Range("B9").Select
Selection.Copy
Workbooks.Open Filename:= _
"C:\Documents and Settings\Administrator\Desktop\Forms\test-new\test\input_file.xls"
Range("A2").Select
ActiveSheet.Paste


I know I will need to include an Offset to allow, the data from different HTML files, to be pasted into a row.

I would really appreciate it if someone can give me a few pointers on how to get the import loop working, plus the copy and past offset.

I have included the test HTML file in the attached 1.zip file.

Thanks
Karl

TomSchreiner
07-29-2008, 09:25 AM
Each file only contains the one bit of data? A date? Are all of the HTML files the same?

Please post an actual HTML file if the test file you attached to your initial post is not a true representation. If you are only looking for one specific item, you can get that from each file using a more direct method. There is no need to import, filter, copy, and paste.

karldou
07-29-2008, 12:26 PM
Hi Tom, thanks for your reply.
I had to strip the HTML file before for confidentiality reasons. But i have re-edited it to keep most of the the formatting intact.

You're right there are more tables included in the file. the table i'm focusing on is webtable 3 in the file. which includes the Leave Date field.

Hope this clears it up a bit more. thanks
Karl

I have attached the new htm file for you

TomSchreiner
07-29-2008, 06:32 PM
Hi Karl. Assuming that you only need to grab the date, run example after changing the folderpath. This example loads all of your dates into an array and then assigns the array to range a1:a?.

Requires references to MS Scripting and MSHTML.

Sub Example()
Dim AllDates() As Date

AllDates = GetDates("C:\Documents and Settings\TJS\Desktop\New Folder")
Range("A1:A" & UBound(AllDates) + 1).Value = Application.Transpose(AllDates)

End Sub

Function GetDates(Folder As String) As Date()

Dim fso As New FileSystemObject
Dim f As File
Dim DocOpen As New HTMLDocument, Doc As HTMLDocument
Dim GetDatesTemp() As Date

ReDim GetDatesTemp(0)

For Each f In fso.GetFolder(Folder).Files
Set Doc = DocOpen.createDocumentFromUrl(f.Path, vbNullString)
Do Until Doc.readyState = "complete": DoEvents: Loop
With Doc.getElementsByTagName("TABLE")(2).Rows(1).Cells(1).getElementsByTagName("SPAN")
GetDatesTemp(UBound(GetDatesTemp)) = DateValue(.Item(2).innerText & "/" & .Item(0).innerText & "/" & .Item(4).innerText)
ReDim Preserve GetDatesTemp(UBound(GetDatesTemp) + 1)
End With
Next
ReDim Preserve GetDatesTemp(UBound(GetDatesTemp) - 1)
GetDates = GetDatesTemp
End Function

karldou
07-30-2008, 12:48 AM
Hi Tom, thanks for the script!
When i run this in excel i get the following error:
User-defined type not defined and its pointing to the line: fso As New FileSystemObject.

Also as another thought.... would this easily be adaptable for future expansion. I.e. to possibly import another cell range such as the First and Last Name of the person?

Many thanks for you time.
karl

TomSchreiner
07-30-2008, 04:19 AM
Hi Karl.

"User-defined type not defined and its pointing to the line: fso As New FileSystemObject."

Requires references to MS Scripting and MSHTML.

Also as another thought.... would this easily be adaptable for future expansion. I.e. to possibly import another cell range such as the First and Last Name of the person?

Yes

karldou
07-30-2008, 05:20 AM
Hi Tom, thanks for your reply

I have been researching around many excel forum's and i have come across a script that i have been able to adapt for my needs.

It basically reads a list of URL paths listed in Column A and imports the data from each individual htm file.
Sub GetData()

Dim wSU As Worksheet

Dim iForRow As Integer
Dim iLastRow As Integer
Dim sURL As String
Set wSU = ThisWorkbook.Sheets("One")
Range("A1").Select
iLastRow = Cells(Rows.Count, "a").End(xlUp).Row

For iForRow = 1 To iLastRow Step 1
sURL$ = wSU.Cells(iForRow, "a").Value

With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" & sURL, Destination:=Range("B1"))
.Name = "test"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = False
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertEntireRows
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "5,7"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With

Next iForRow
End Sub


The only issue is, once it has completed importing the data from the first html file, it then imports the data from the next html to the right of the first set. I am trying to set an offset within the Destination:=Range area, so each set of data is imported below the previous with a 1 row gap.

I have tried
Destination:=Range("B1").Offset(10,0)
however this just move every entry 10 rows down, instead of increments of 10 rows.

I hope this make sense, and that somebody can help me.

Many thanks for all your help so far!

TomSchreiner
07-30-2008, 05:49 AM
Karl. Before I bother with the same approach you began with in your initial post, did you bother setting the references I mentioned in both of my replies. The example I gave you returns an array of the data you are looking for and then assigns it directly to a range.

karldou
07-30-2008, 05:59 AM
Hi Tom, I don't know how to reference MS Scripting and MSHTML. I'm very new to VBA.

I've managed to solve the offset issue by incrementing a number range.

I.e.
Row = 1
Destination:=Range("b2").Offset(Row, 0))
End With
Row = Row + 14

Now all the entries are below eachother with the required gap inbetween. Wasn't as hard as I thought!