PDA

View Full Version : Solved: Extract MS Word's HTML Using VBA



Mavyak
03-17-2010, 01:05 PM
Where I work we have online technical support case submission. The control we type our case bodies into is an HTML editor. We have people that type up there cases in MS Word and then copy and paste it into the HTML editor. When they do that, the HTML editor stores MS Word's HTML in the background and renders the formatted text. When they submit the case, the underlying HTML is written to a database for display on our Intranet.

MS Word's HTML is so verbose our HTML editor has a special paste option built in called "Paste from MS Word (with cleanup)". I have a case on my desk for something very similar and in order to explain it to the client I need to replicate the steps above up to the point of copying the text in Word. At that point I would like to Alt+F11 and Debug.Print the underlying HTML of the ActiveDocument. How do I do it?

lucas
03-17-2010, 01:33 PM
Sounds like a mess.

Why not paste it from word as just text? Do you need the formatting that badly?

lucas
03-17-2010, 01:39 PM
I think you would be better of looking at it from the microsoft script editor

View-toolbars-visual basic will get you the following toolbar.

The arrow points to the script editor. It may not have been installed but you can install it and I think it's what you are looking for.

Mavyak
03-17-2010, 02:25 PM
Unfortunately, Microsoft removed the Script Editor out of Word 2007 for increased security. Here's an excerpt from http://technet.microsoft.com/en-us/library/cc179199.aspx


Microsoft Script Editor (MSE): The removal of this low-use feature increases security. Documents that contain scripts and that are upgraded to the new file formats will lose the scripts without warning.
The issue isn't for the online case submission. It's for a similar situation with our software. We allow people to save email templates but we impose a character limit on the size of a template they can store. Users are creating the templates in MS Word and then copying and pasting them into our software's HTML editor. The user doesn't understand why their template gets truncated because the visible text is less than x characters. They don't realize that MS Word's verbose HTML can skyrocket a character count for the template (because the user is never exposed to the underlying HTML).

Is there no way to spit out the HTML to a text file or the immediate window?

SamT
03-18-2010, 08:00 AM
Is there no way to spit out the HTML to a text file or the immediate window?

Why not just save the html doc and open it in Notepad?

I just saved an empty doc.html and it had 79 lines of html code in it.

lucas
03-18-2010, 08:23 AM
saveas html does seem like a good idea. It works as Sam has pointed out.

fumei
03-18-2010, 08:29 AM
That is the way to go. In fact, that is probably the only way to go.

Mavyak
03-19-2010, 12:50 PM
I appreciate all the input. It's a shame MS did away with the script editor but then again, I'm not the one taking calls/letters/emails from angry clients due to whatever security threats it posed.

I'll be sure to use the save as HTML in the future.

The resolution to my problem was actually easier than I thought. The underlying database field was of type "text" so there really was no character limit restriction at the table level. The variable in the stored procedure that populates the table was the limiting factor. We're going to bump it up well beyond 8,000 characters so that should alleviate our clients' problems.

Thanks again for all your assistance. It's nice to know I can count on VBA Express when I hit a jam!

lucas
03-19-2010, 12:53 PM
Be sure to mark your thread solved using the thread tools at the top of the page.

That keeps others offering help from reading the entire thread just to find it's been resolved.

Mavyak
03-19-2010, 01:51 PM
Thanks. I just did it. I actually looked for that link at the bottom of the thread where you reply but couldn't find it. When I didn't find it I assumed an admin would do it. Thanks for the heads-up.