Consulting

Results 1 to 9 of 9

Thread: Solved: Automated editing of HTML files..

  1. #1

    Solved: Automated editing of HTML files..

    Hi all!

    I have a lot of HTML files of a similar format that I wish to cut data from and save to a comma delimited file.

    All files have the same format, with a title, image, price, stock figure, weight, part code and URL for the manufacturer. I need to cut the price, stock figure, part code and URL from the file (so they don't show) and add them into a csv file along with the name of the file being processed.

    As I am familiar with Excel VBA I thought I could dive straight into Frontpage VBA and code this very easily...not so. Either I'm being really dumb, or this is not as easy as it sounds...

    I have attached 2 sample files in the .zip attachment.

    I would appreciate ANY help whatsoever on this...

    Rob.

  2. #2
    VBAX Mentor ALe's Avatar
    Joined
    Aug 2005
    Location
    Milan
    Posts
    383
    Location
    I'd do this way...

    1. In the excel file create a web query on one of your html pages (registering it with the macro recorder).
    2. Adapt the registered code so that you can loop for all the html files.
    3. At the end of the sub insert some code to store/copy/write data of each file in a csv file.

    what you think?

  3. #3
    Thanks for the reply ALe I will give your suggestion a try. I'll let you know how I get on!

  4. #4
    Moderator VBAX Wizard lucas's Avatar
    Joined
    Jun 2004
    Location
    Tulsa, Oklahoma
    Posts
    7,323
    Location
    If the data is in tables you might take a look at this kb entry:
    http://vbaexpress.com/kb/getarticle.php?kb_id=576
    Steve
    "Nearly all men can stand adversity, but if you want to test a man's character, give him power."
    -Abraham Lincoln

  5. #5
    Knowledge Base Approver
    The King of Overkill!
    VBAX Master
    Joined
    Jul 2004
    Location
    Rochester, NY
    Posts
    1,727
    Location
    Hi Rob,

    Give the following a try, should do what you need:[vba]Sub fatbaldbob()
    Dim FileArray() As String, CSVData() As String
    Dim Cnt As Long, i As Long, vFF As Long
    Dim tempStr As String, CSVFile As String, vFile As String
    Dim RegEx As Object

    CSVFile = "C:\rob.csv"

    Cnt = 0
    ReDim CSVData(4, 0) '0=price,1=stock figure,2=part code,3=url,4=path\filename
    ReDim FileArray(1, 0) '0=path,1=filename
    vFileSearch "C:\samples\", FileArray
    ' vFileSearch "C:\samples2\", FileArray 'if you want to look in more than one directory
    Set RegEx = CreateObject("vbscript.regexp")
    With RegEx
    .Global = True
    .IgnoreCase = True
    .MultiLine = True
    End With
    For i = 0 To UBound(FileArray, 2)
    vFF = FreeFile
    vFile = FileArray(0, i) & FileArray(1, i)
    Open vFile For Binary As #vFF
    tempStr = Space$(LOF(vFF))
    Get #vFF, , tempStr
    Close #vFF
    RegEx.Pattern = "?[\d\.]+[^\x00]*?\d+ in Stock[^\x00]*?Part Code[^\x00]" & _
    "*?<b>[^\x00]*?<\/b>[^\x00]*?<a href=""http[^\x00]*?""[^\x00]*?<\/a>"
    If Not RegEx.Test(tempStr) Then
    MsgBox FileArray(0, i) & FileArray(1, i) & vbCrLf & _
    "File pattern not met, skipping file"
    Else
    ReDim Preserve CSVData(4, Cnt)

    RegEx.Pattern = "(?[\d\.]+)"
    CSVData(0, Cnt) = RegEx.Execute(tempStr).Item(0).SubMatches(0)
    tempStr = RegEx.Replace(tempStr, "&nbsp;")

    RegEx.Pattern = "(\d+ in Stock)"
    CSVData(1, Cnt) = RegEx.Execute(tempStr).Item(0).SubMatches(0)
    tempStr = RegEx.Replace(tempStr, "&nbsp;")

    RegEx.Pattern = "(Part Code[^\x00]*?<b>)([^\x00]*?)(<\/b>)"
    CSVData(2, Cnt) = RegEx.Execute(tempStr).Item(0).SubMatches(1)
    tempStr = RegEx.Replace(tempStr, "&nbsp;")

    RegEx.Pattern = "<a href=""(http[^\x00]*?)""[^\x00]*?<\/a>"
    CSVData(3, Cnt) = RegEx.Execute(tempStr).Item(0).SubMatches(0)
    tempStr = RegEx.Replace(tempStr, "&nbsp;")

    CSVData(4, Cnt) = vFile
    Cnt = Cnt + 1

    vFF = FreeFile
    Open vFile For Output As #vFF
    Print #vFF, tempStr;
    Close #vFF
    End If

    Next
    vFF = FreeFile
    Open CSVFile For Output As #vFF
    For i = 0 To Cnt - 1
    Print #vFF, Join(Array(CSVData(0, i), CSVData(1, i), CSVData(2, i), _
    CSVData(3, i), CSVData(4, i)), ",")
    Next
    Close #vFF
    Set RegEx = Nothing
    End Sub
    Function vFileSearch(ByVal vPath As String, ByRef FileArray() As String, _
    Optional ByVal vExtension As String = "html") As Boolean
    Dim tempStr As String, vCnt As Long
    If Len(FileArray(0, LBound(FileArray, 2))) = 0 Then
    vCnt = LBound(FileArray, 2)
    Else
    vCnt = UBound(FileArray, 2) + 1
    End If
    If Right(vPath, 1) <> "\" Then vPath = vPath & "\"
    On Error Resume Next 'in case no 'read' rights to directory
    tempStr = Dir(vPath & "*." & vExtension)
    On Error GoTo 0
    Do Until Len(tempStr) = 0
    ReDim Preserve FileArray(1, vCnt)
    FileArray(0, vCnt) = vPath
    FileArray(1, vCnt) = tempStr
    vCnt = vCnt + 1
    tempStr = Dir
    Loop
    End Function[/vba]Please don't hesitate to ask any questions!
    Matt

  6. #6
    Wow, thanks ALe, Lucas and mvidas - can't thank you enough!
    I am finally getting somewhere with this now!

    Will try and finally nail this problem this weekend now...


  7. #7
    Matt, your code works perfectly! Thanks for your time, you are a star!
    How do I extract the weight, and delete the dashes? (I've tried to understand your code, but it's a bit beyond me I'm afraid!)
    I'm sure it's real easy when you know how...


  8. #8
    Sorted! Couldn't get my head around the regular expression patterns (someone should create an online syntax checker!) But have finally done what I needed to.


  9. #9
    Knowledge Base Approver
    The King of Overkill! VBAX Master
    Joined
    Jul 2004
    Location
    Rochester, NY
    Posts
    1,727
    Location
    Well we're here if you do have any questions There is a KB entry by brettdj that tells you the syntax (though it sounds like you know it), and there are some online checkers out there (can't think of any at the moment, regexbuddy maybe?). Feel free to post your modified code here, if you don't want that information there at all (weights or the dotted lines), there might be an easier way of doing it.

    Also, after thinking a little more about it, we could modify the code to convert it to VBScript, so you can just right-click the .html files and go to Send To to edit them. Just ideas though now, as I'm not on my computer at the moment, but let me know if anything sounds good.

    Glad to help though!
    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •