PDA

View Full Version : Sleeper: Trying to paste HTML data into Word doc (automated)



dc4life78
01-13-2009, 08:05 AM
Greetings all,

I have written a program to copy HTML data from an Internet Explorer object (file specified by the user) and paste it into a created Word document so that it can be formatted for conversion to PDF. The problem is that the format changes slightly regardless of what method I use to paste the data. Here is the intro code:


Dim oHtm As Object
Dim oWord As Word.Application
Dim oDoc As Word.Document
Set oHtm = CreateObject("InternetExplorer.Application")
oHtm.navigate (fullFileName)
oHtm.Document.execCommand "SelectAll"
oHtm.Document.execCommand "Copy
Set oWord = New Word.Application
oWord.Documents.Add
Set oDoc = oWord.Documents(1)

Here are some of the methods I have tried to use to paste the copied HTML data:


oDoc.Content.Paste
oWord.Selection.Paste
oDoc.Content.PasteSpecial DataType:=wdPasteHTML

All of these methods paste the data correctly but the H1, H2, and H3 headers are incorrectly formatted - it places a page-length horizontal line after each header and removes the dark blue font I have specified for headers in the CSS (text is now black.)

I tried using PasteAndFormat below and it does format the headers correctly, but for many HTML files it also inserts outline numbering throughout the entire text:


oWord.Selection.PasteAndFormat (wdFormatOriginalFormatting)

There is nothing in the original HTML code that would explain this phenomenon. Interestingly enough, however, when I pause the code operation after the intro code listed above and manually paste the data into the document (Control-V) it pastes perfectly, but I cannot seem to automate this specifc paste method. I have been unsuccessful in my attempts at using DoCmd (which I use all the time with Access but it does not seem to be compatible with the Word application or its objects) and SendKeys "^V" True (I read somewhere that SendKeys does not work with Vista, which I am running, but I have gotten the command to work in other programs.) Out of desperation I even tried tinkering with the Word paste options such as SmartCutPaste, yet all to no avail.

Any help would be appreciated; thanks in advance!

dc4life78
01-21-2009, 10:12 AM
I posted this question before and did not get a response so I will simplify the issue I am having. I am trying to paste HTML text into a new Word document but using the traditional .Paste and .PasteAndFormat methods does not yield the same format as the original HTML. Pausing the code and manually typing Ctrl-V does give me what I need; is there a way to automate this other than Sendkeys which does not work for me?

thanks

lucas
01-21-2009, 10:28 AM
Threads merged. Multiple threads asking the same question will not result in more answers, it just makes things confusing....

post followup questions or ask again for help in the same thread please.

This works for me:

Selection.PasteAndFormat (wdPasteDefault)
Obtained from the macro recorder.....

dc4life78
01-21-2009, 10:39 AM
Thank you for your response.

Unfortunately, none of the PasteAndFormat options, including PasteDefault, give me precisely what Control-V does - it pastes the HTML data but removes all CSS formatting for the H1, H2, and H3 headers and puts thick horizontal lines after each header. However it does look like I can use this method and modify the code to examine the header styles of each line and reformat the text myself. It is a more tedious approach but I will try it and see if it works.

lucas
01-21-2009, 10:46 AM
css files are external to the html file. You can't access the formatting done with css by copy and paste I don't think.

lucas
01-21-2009, 10:49 AM
Sounds like you need Front Page or some editor specifically designed for web pages.

Maybe Tony or Gerry has an idea but I think you are trying to open a can with a knife when you have a perfectly good can opener in the drawer.

Why not save the web pages as a screen capture and paste into word as a picture or save them as pdf.

As you can see I don't really have an answer, I just don't really understand what you are trying to do.......

TonyJollans
01-21-2009, 03:42 PM
I can sympathise a little as it does sometimes seem as though no matter how hard you try, none of the various pasting methods available in VBA seems to quite duplicate Word's default - and I'm afraid I have no answer if you are finding this to be the case.

I can say, however, that pasting from a web page is a complex operation, and Word has to, in essence, fit a square peg into a round hole, and some times it manages better than others.

I will also say, for what its worth, that DoCmd is not a Word method, that Sendkeys does work in Vista (as well as in any other version of Windows, that is), and that whether formatting is applied using CSS or not should make no difference to any Paste.

Perhaps if you could tell us the web page that is giving you problems we could help a little more; other than that I see little alternative to trial and error.

lucas
01-21-2009, 03:54 PM
and that whether formatting is applied using CSS or not should make no difference to any Paste.

I wasn't sure about this Tony, grasping at straws...

TonyJollans
01-22-2009, 01:39 PM
Hi Steve,

I can understand you wanting to grasp at straws when working with HTML and Word <g>

Think of it the other way round. When you copy from Word into another rich text application, one that doesn't use Styles (Excel, say), does it matter whether formatting comes from Styles or not?

lucas
01-22-2009, 01:54 PM
Of course not and it would follow logically that once the styles are applied using stylesheets that the same would obviously be true when copying from a web page.

I'll go slap myself with a fish now...

fumei
01-22-2009, 02:23 PM
nah, just a minnow will do...

IMO, just thinking about HTML and Word makes me shudder.

dc4life78
02-03-2009, 07:48 AM
Thanks all for the responses. The CSS formatting is not the issue, because the paste function works perfectly well with a simple ctrl-V, but none of the automated methods can duplicate its results. Screen capture is not an option either because my program makes several advanced formatting changes to the text before converting it to PDF. I was able to circumnavigate the paste problems by using the default paste and searching line-by-line adjusting the formatting for H1, H2, and H3 headers (whose formatting was not copied from the original HTML) and now the program works fine, though for my own knowledge I am still trying to figure out how to automate the Ctrl-V function; it really shouldn't be this difficult.