PDA

View Full Version : Find Formatted Text within a Paragraph (and then surround with tags)



lmshaft
03-03-2006, 09:35 AM
I have a manual process in Word where I create articles using Word 2003 and then save the articles as .rtf and then on to html (with a separate rtf-to-html conversion process). I'm attempting to streamline the process by making it all happen in Word by either converting directly to HTML (or possibly XML and then on to HTML).

The articles contain default and Custom Word Styles. I am currently looping through all paragraphs, finding certain styles, then adding html code before and after the selected paragraph. What I'd also like to do is find formatted text within a paragraph (of only certain Styles) and change the formatted text to their corresponding html tags. Here's the code I have so far...


Public Sub Main()
Dim p As String
p = "<p class=""class1"">"
pClose = "</p>"
h2 = "<h2 class=""class2"">"
h2Close = "</h2>"
i = "<i>"
iClose = "</i>"
' loop through all paragraphs for opening paragraph tags
For Each paragraph In ActiveDocument.Paragraphs

paragraph.Range.Select

If paragraph.Style = "Normal" Then
Selection.Range.InsertBefore (p)
' MsgBox "paragraph selected"
End If
If paragraph.Style = "Heading 2" Then
Selection.Range.InsertBefore (h2)
End If
' within each paragraph, I need to find all Italic, Bold, etc. text and replace with tags
' how do I just search text that is currently selected?

Next
' for close tags
For Each paragraph In ActiveDocument.Paragraphs
paragraph.Range.Select
If paragraph.Style = "Normal" Then
Selection.Range.MoveEnd Unit:=wdCharacter, Count:=-1
Selection.Range.InsertAfter (pClose)
End If
If paragraph.Style = "Heading 2" Then
Selection.Range.InsertAfter (h2Close)
End If
Next

' find eM dash and other special characters and replace with html characters
Set MyRange = ActiveDocument.Content
MyRange.Find.Execute FindText:="^+", _
ReplaceWith:="?", Replace:=wdReplaceAll

End Sub


I'm new to VBA, so if there's a better way to do any of this, I'm all ears. Or, if this has already been answered, please point me to the correct link. Thanks!

TonyJollans
03-03-2006, 10:13 AM
I think you'll find this extremely difficult in the general case - and probably impossible without examining every character, which will be extremely slow.

Can you not work with what Word gives you if you Save As HTML?

If you're really only looking for certain specified styles then you should be able to use Find and Replace - try doing it through the UI and recording the code.

lmshaft
03-03-2006, 10:57 AM
I attempted to record a macro doing this earlier, but it seemed I set the macro up incorrectly because when I ran the code that it produced within my macro, it didn't do anything. I'll experiment with this again and see if I can make it work.

I don't think the Word Save As HTML feature is an option for this situation because I need specific style information to be attached to certain paragraphs, tables, etc., and it seems Word applies it's own style sheet info to the document. I would need to go in and convert Words css to my css, which would pretty much be like what goes on with the current rtf-to-html conversion that I'm trying to remove from the process.

'Extremely Slow' is relative, of course, because our current process of saving as .rtf, then opening up the articles on another computer and converting from .rtf to word with 3 or 4 obsolete scripts is extremely slow. :) If I do have to resort to examining every character (which of course, I'd rather not do), if the time to run this macro is a minute per article, this would still be saving me time (since the current process has so many slow steps).

Thanks for your response!

TonyJollans
03-03-2006, 11:15 AM
Sometimes recording Fand and Replace with Formats doesn't work properly. Post the recorded code and someone can help you correct it.

If you don't know anything about the document then examining every character is, I think, the only option - but if you know, say, that formatting is only applied to whole words then you can maybe get away with only examining words.

lmshaft
03-03-2006, 01:20 PM
I'm not 100% positive, but I think formatting will only be applied to whole words. I'm going to have to investigate that further, though.

I recorded a macro for finding a Style called "Emphasis", which produced the code:


' currently finds text with a style of "Normal"
Sub FindStyle()
Selection.Find.Style = ActiveDocument.Styles("Emphasis")
With Selection.Find
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute
End Sub


However, I want to search for a Custom Style called "Italic" or "Style Italic". The macro errors when I attempt to change the style name to my custom style name. How do I go about Finding a Custom Style?

I thought this article was going to give me the answer (http://wordtips.vitalnews.com/Pages/T1199_Printing_a_List_of_Custom_Styles.html), but running this macro did not print out the custom styles I was expecting to see.

TonyJollans
03-04-2006, 03:14 AM
Are you quite sure the styles exist in your document, and that you have spelled them correctly? If so, what is the error message? Can you look for the style(s) through the UI and, if so, can you record it?