PDA

View Full Version : Use MS Word's webpage-ability for html-entities



Power Cosmic
04-01-2010, 01:52 PM
Hello,

Some special characters of text for a webpage sometimes need to be changed to so called html-enitities, this way the browser will know how to display them right. When you save your .doc as a webpage, this changing to html-entities will automatically happen if needed. That's very cool of MS Word. Unfortunately, you can't really use the webpage of MS Word, because of all the unnecassary xml tags.

However, is it possible to make use of this abilty from MS Word (changing special characters to html-entities) for my "website-text" through a VBA-script? Desires:
-only the text in my .doc or .txt file must be converted.
-the output should be to a .txt file

Another question due to my empty knowledge about VBA:
-can you create standalone programs with VBA or can one only use it within an Office product? If latter, this means I can't use a .txt file for input in my former question.

Thanks.

fumei
04-06-2010, 12:13 PM
"-can you create standalone programs with VBA or can one only use it within an Office product? If latter, this means I can't use a .txt file for input in my former question."

No, VBA must be used within a VBA-compliant application (mostly Office apps, but other apps are VBA-compliant, Corel Draw for example). They are not stand-alone.

However, VBA can quite easily use .txt files. There is no problem using .txt files.

"However, is it possible to make use of this abilty from MS Word (changing special characters to html-entities) for my "website-text" through a VBA-script? "

This is out of my area of knowledge, especially as, generally speaking, using Word to do anything regarding HTML is a poor idea...at best. I do not know if it is possible to isolate this conversion, but I doubt it. However, that could be because I am not really sure what "html-entities" are.

"-only the text in my .doc or .txt file must be converted."

What else is in your .doc file? As for a .txt file, there can not be anything BUT text. So I do not see the issue.

Perhaps save the file as html (thus getting your html-entities??), open the file in a text editor (or do a line by line extraction in Word using the text file and VBA)?

Power Cosmic
04-07-2010, 04:39 AM
Hello fumei,

Thank you for your reaction.

Htmlentities: these are characters which represent one special character, for displaying them rightly in a webbrowser. For example, the htmlentity for "ä" will be 'ä'. If you don't use these entities, chances are the webbrowser displays the sign for unknown character (a little block or questionmark).

What I meant with "-only the text in my .doc or .txt file must be converted." was that MS Word generates other html-code when you save the .doc file as a webpage (.html). Those extra code I don't need.


(or do a line by line extraction in Word using the text file and VBA)? I don't know anything about VBA. This would seem to be a nice project for me to learn about it. But before I started I first wanted to know if it's even possible what I want. I have to think about what you say here, like how VBA can recognise the code in a line what I want (and redirect it to .txt file)? I think it must be possible...or maybe not.
I could open the .html file from word in a text-editor, but I have too many files to do this manually (about 40 files in 20 different languages, most of them unknown to me). Besides it's a hell of a job, I already tried. That's why I was thinking about VBA.

EDIT:
sorry, my example for htmlentities doesn't seem to display right here, this forum immediately translates it into the special character.

TonyJollans
04-08-2010, 02:52 AM
I've never really considered what Word does when it saves as a Web page, but you are right that it does output HTML entities where they are required. As far as I know there is no way to get hold of these from Word through VBA (or any other way) - it is something Word does all by itself somewhere behind the scenes.

The best I think you could do would be to identify all characters that need replacing, and to do it yourself - that would, however, require hard coding of a long list of entities. I suppose, if your concern is simply correct display of the web pages, the nicety of the names might be academic, in which case you could use entities of the form   instead of  , for example.