PDA

View Full Version : Need help coding a word file -> html macro



dutchman191
08-16-2012, 05:29 PM
Hi guys,

I am attempting to make a very simple macro to take a word document and insert appropriate tags around sections of text.

If the text is bolded but does not have any fullstop it gets <h3></h3> tags, if it does have a fullstop at the end it gets <strong></strong> tags.

My code:


Dim textRange As Range

selection.HomeKey wdStory
selection.Find.Font.Bold = True

With selection.Find
Do While .Execute(FindText:="", Forward:=True, _
MatchWildcards:=False, Wrap:=wdFindStop, MatchCase:=False) = True

Set textRange = selection.Range

'If the last char of the range = "." then ...
selection.EndKey Unit:=wdLine
selection.Extend
selection.MoveLeft Unit:=wdCharacter, count:=1

If selection <> "." Then
textRange.InsertBefore "<h3>"
textRange.End = textRange.End - 1
textRange.InsertAfter "</h3>"
selection.Collapse wdCollapseEnd
End If

If selection = "." Then
textRange.InsertBefore "<strong>"
textRange.End = textRange.End - 1
textRange.InsertAfter "</strong>"
textRange.Font.Bold = wdToggle 'Reason I have this line is so when I go to put all the <p> tags in it will surround the strong tags.
selection.Collapse wdCollapseEnd
End If

If selection.Range.Bookmarks.Exists("\EndOfDoc") Then Exit Do

Loop
End With


My problem begins when I loop again, it does not clear the selection properly, selection becomes </h3> and then inf loop. I do not know why it does not search for the next section of bolded text and perform the if test again..

Can anyone see any errors?

Example text:

This is a heading

This is a paragraph with some bolded text.

Needs to be:
<h3>This is a heading</h3>

<p>This is a paragraph with some <strong>bolded text.</strong></p>

Cheers for any help.

EDIT: ADDED VBA CODE TAGS - Tommy

dutchman191
08-16-2012, 11:58 PM
UPDATE:

Managed to fix that problem.

Getting an interesting error now:
Dim drange As Range
Dim text As String
Dim lastChar As String

selection.HomeKey wdStory
selection.Find.Font.Bold = True

With selection.Find
Do While .Execute(FindText:="", Forward:=True, _
MatchWildcards:=False, Wrap:=wdFindStop, MatchCase:=False) = True

Set drange = selection.Range

'Store last char of range.

text = Application.selection.text
lastChar = Right(text, 2)


If lastChar <> "." Then
selection.HomeKey unit:=wdStory
With selection.Find
.ClearFormatting
.text = lastChar & "^p"
.Wrap = wdFindContinue
End With

If selection.Find.Execute Then
drange.InsertBefore "<h3>"
drange.End = drange.End - 1
drange.InsertAfter "</h3>"
selection.Collapse collapseEnd
Else
drange.Font.Bold = wdToggle
drange.InsertBefore "<strong>"
drange.End = drange.End - 1
drange.InsertAfter "</strong>"
selection.Collapse collapseEnd
End If

ElseIf lastChar = "." Then
drange.Font.Bold = wdToggle
drange.InsertBefore "<strong>"
drange.End = drange.End - 1
drange.InsertAfter "</strong>"
selection.Collapse collapseEnd
End If

selection.EscapeKey

If selection.Range.Bookmarks.Exists("\EndOfDoc") Then Exit Do

Loop
End With

I am using the last character of the bolded selection as my reference point to either put <strong> or <h3>. I realised that if I had a bolded word in the middle of a sentence my initial logic would make it a h3 regardless.

So in hopes of fixing I have set up another if statement:selection.HomeKey unit:=wdStory
With selection.Find
.ClearFormatting
.text = lastChar & "^p"
.Wrap = wdFindContinue
End With

If selection.Find.Execute Then

For some reason the script does not find the character and the paragraph symbol... Is there an easier way to compare a string to the document and then perform an action?

macropod
08-17-2012, 02:27 AM
Try:
Sub Demo()
Dim Rng As Range, i As Long
With ActiveDocument.Content
With .Find
.ClearFormatting
.Text = ""
.Font.Bold = True
With .Replacement
.ClearFormatting
.Text = ""
End With
.Forward = True
.Wrap = wdFindStop
.Format = True
.MatchWildcards = True
.Execute
End With
While .Find.Found
With .Duplicate
If .Characters.Last.Text Like "." Then
.InsertBefore "<strong>"
.InsertAfter "</strong>"
i = 10
Else
.InsertBefore "<h3>"
.InsertAfter "</h3>"
i = 6
End If
End With
.End = .End + i
.Collapse wdCollapseEnd
.Find.Execute
Wend
End With
End Sub

gmaxey
08-17-2012, 05:07 AM
Paul,

Nice code. I'm not sure why you choose to employ While .Find.Found?

It may be required, but the result looks a little odd with the ending </h3> tab appearing on a new line. Also, while not addressed by the OP, what about:

This is not a heading but this word is bold and the rest aren't.
This is not a heading but this <strong>word</strong> and the rest aren't.
Sub Demo()
Dim oRng As Range, i As Long
Set oRng = ActiveDocument.Content
With oRng.Find
.ClearFormatting
.Text = ""
.Font.Bold = True
.Forward = True
.Wrap = wdFindStop
.Format = True
.MatchWildcards = True
While .Execute
With oRng.Duplicate
Select Case .Characters.Last.Text
Case ".", "!", "?"
.InsertBefore "<strong>"
.InsertAfter "</strong>"
i = 10
Case vbCr
.End = .End - 1
.InsertBefore "<h3>"
.InsertAfter "</h3>"
i = 5
Case Else
.InsertBefore "<strong>"
.InsertAfter "</strong>"
i = 10
End Select
End With
With oRng
.End = .End + i
.Collapse wdCollapseEnd
.Select
End With
Wend
End With
End Sub

macropod
08-17-2012, 04:27 PM
Paul,

Nice code. I'm not sure why you choose to employ While .Find.Found?
Because I chose to. With the While .Execute method, you need to define and work with a Range object. With While .Find.Found you don't.

It may be required, but the result looks a little odd with the ending </h3> tab appearing on a new line.
The updated code below deals with that.

Also, while not addressed by the OP, what about:

This is not a heading but this word is bold and the rest aren't.
This is not a heading but this <strong>word</strong> and the rest aren't.
I'm not quite suure what you're driving at here but shouldn't the conversion be from:
This is not a heading but this word is bold and the rest aren't.
to:
This is not a heading but this <h3>word is bold</h3> and the rest aren't.
Sub Demo()
Dim i As Long
With ActiveDocument.Content
With .Find
.ClearFormatting
.Text = ""
.Font.Bold = True
With .Replacement
.ClearFormatting
.Text = ""
End With
.Forward = True
.Wrap = wdFindStop
.Format = True
.MatchWildcards = True
.Execute
End With
While .Find.Found
With .Duplicate
i = 0
While .Characters.Last.Text Like "[ " & Chr(13) & Chr(11) & Chr(10) & "]"
.End = .End - 1
i = 1 + 1
Wend
If .Characters.Last.Text Like "[.!?:;]" Then
.InsertBefore "<strong>"
.InsertAfter "</strong>"
i = i + 9
Else
.InsertBefore "<h3>"
.InsertAfter "</h3>"
i = i + 5
End If
End With
.End = .End + i
.Collapse wdCollapseEnd
.Find.Execute
Wend
End With
End Sub

fumei
08-17-2012, 08:00 PM
I'm not quite suure what you're driving at here but shouldn't the conversion be from:
This is not a heading but this word is bold and the rest aren't.
to:
This is not a heading but this <h3>word is bold</h3> and the rest aren't.Would that not be weird? To make a heading in the middle of a sentence?

ALTHOUGH....that is what the OP asked for.

If the text is bolded but does not have any fullstop it gets <h3></h3> tags, if it does have a fullstop at the end it gets <strong></strong> tags.

ALTHOUGH #2 it is NOT specified that the example has a "full stop", or not. frankly, I would assume that it does, but it is not specified.

macropod
08-17-2012, 11:00 PM
Indeed, I'm just working from the OP's specs.

An analogy with Word is the use of a Style Separator to include text in a heading that's not to be included in theTOC. The additional content would not necessarily have the same font attributes as the rest of the heading.

gmaxey
08-18-2012, 07:45 AM
Paul,

Infering to your decision to use ".Found" was not intented to be snippy.

Unless I am doing something wrong, you the last code you posted doesn't work correctly.

Starting with:

This is a bold heading
This is a bold sentence.
This is sentence unbold sentence. This is a bold sentence.
This is not a heading, not a bold sentence, but it it contains a bold words.

returns this:

<h3>This is a bold heading</h3>
This is<strong> a bold sentence.</strong>
This is sentence unbold sentence. <strong>This is a bold sentence.</strong>
Thisis not a heading, not a <h3>bold</h3> sentence, but it contains<h3>bold</h3> words.

Note the location of the open "<stron>" tag in the bold sentence example. I think this is because when you redefine the range by moving the end then there is no longer a need to offset it by the length of the tag. Make sense?

Also, I don't see why the two bolded words in the non bolded heading/sentences should get a <h3> tag.

Here is what I've come up with using your code as a base.

Sub Demo()
Dim i As Long
Dim oRng As Word.Range
Dim bRangeRedefined As Boolean
Set oRng = ActiveDocument.Content
With oRng.Find
.ClearFormatting
.Text = ""
.Font.Bold = True
With .Replacement
.ClearFormatting
.Text = ""
End With
.Forward = True
.Wrap = wdFindStop
.Format = True
.MatchWildcards = True
While .Execute
bRangeRedefined = False
With oRng.Duplicate
oRng.Select
i = 0
While .Characters.Last.Text Like "[ " & Chr(13) & Chr(11) & Chr(10) & "]"
.End = .End - 1
i = i + 1
bRangeRedefined = True
Wend
If .Characters.Last.Text Like "[.!?:;]" Then
.InsertBefore "<strong>"
.InsertAfter "</strong>"
i = i + 9
ElseIf oRng.Start = oRng.Paragraphs(1).Range.Start Then
.InsertBefore "<h3>"
.InsertAfter "</h3>"
i = i + 5
Else
.InsertBefore "<strong>"
.InsertAfter "</strong>"
i = i + 9
End If
End With
If Not bRangeRedefined Then
oRng.End = oRng.End + i
End If
oRng.Collapse wdCollapseEnd
oRng.Select
Wend
End With
End Sub

Which returns:


<h3>This is a bold heading</h3>
<strong>This is a bold sentence.</strong>
This is sentence unbold sentence. <strong>This is a bold sentence.</strong>
Thisis not a heading, not a <strong>bold</strong> sentence, butit contains <strong>bold</strong> words.

macropod
08-18-2012, 09:06 AM
Hi Greg,

I didn't take offence re .Find.Found.

I see what you mean re the tag positioning. At the same time, though, I'd rather not work with selections (rather inefficient). How about the following, which I've enhanced so that the tags don't get bolded:
Sub Demo()
Dim i As Long, j As Long
With ActiveDocument.Content
With .Find
.ClearFormatting
.Text = ""
.Font.Bold = True
With .Replacement
.ClearFormatting
.Text = ""
End With
.Forward = True
.Wrap = wdFindStop
.Format = True
.MatchWildcards = True
.Execute
End With
While .Find.Found
With .Duplicate
i = 0
While .Characters.Last.Text Like "[ " & Chr(13) & Chr(11) & Chr(10) & "]"
.End = .End - 1
i = i + 1
Wend
j = Len(.Text)
If .Characters.Last.Text Like "[.!?:;]" Then
.Collapse wdCollapseStart
.InsertBefore "<strong>"
.Font.Bold = False
.End = .End + j
.Collapse wdCollapseEnd
.InsertAfter "</strong>"
.End = .End + i
.Font.Bold = False
Else
.Collapse wdCollapseStart
.InsertBefore "<h3>"
.Font.Bold = False
.End = .End + j
.Collapse wdCollapseEnd
.InsertAfter "</h3>"
.End = .End + i
.Font.Bold = False
End If
End With
.Collapse wdCollapseEnd
.Find.Execute
Wend
End With
End Sub

gmaxey
08-18-2012, 09:13 AM
Paul,

Drat. The .select was a hold over from testing. Let me look at your revision and I'll get back.

gmaxey
08-18-2012, 09:35 AM
Paul,

In your last code, the individually bolded words in the sentences that are not a heading and not a complete sentence are still getting flagged with <h3>. I realize this may be what the OP wants so that is ok. However, what seems more logical is:

I am not a heading or a bolded sentence but this is a <strong>bolded</bolded> word.

Sub DemoII()
Dim i As Long, j As Long
Dim oRng As Word.Range
Set oRng = ActiveDocument.Content

With oRng.Find
.ClearFormatting
.Text = ""
.Font.Bold = True
.Forward = True
.Wrap = wdFindStop
.Format = True
.MatchWildcards = True
While .Execute
With oRng.Duplicate
i = 0
While .Characters.Last.Text Like "[ " & Chr(13) & Chr(11) & Chr(10) & "]"
.End = .End - 1
i = i + 1
Wend
j = Len(.Text)

If .Characters.Last.Text Like "[.!?:;]" Then
.Collapse wdCollapseStart
.InsertBefore "<strong>"
.Font.Bold = False
.End = .End + j
.Collapse wdCollapseEnd
.InsertAfter "</strong>"
.End = .End + i
.Font.Bold = False
ElseIf oRng.Start = oRng.Paragraphs(1).Range.Start And oRng.End = oRng.Paragraphs(1).Range.End Then
.Collapse wdCollapseStart
.InsertBefore "<h3>"
.Font.Bold = False
.End = .End + j
.Collapse wdCollapseEnd
.InsertAfter "</h3>"
.End = .End + i
.Font.Bold = False
Else
.Collapse wdCollapseStart
.InsertBefore "<strong>"
.Font.Bold = False
.End = .End + j
.Collapse wdCollapseEnd
.InsertAfter "</strong>"
.End = .End + i
.Font.Bold = False
End If
End With
oRng.Collapse wdCollapseEnd
Wend
End With
End Sub

macropod
08-18-2012, 10:56 AM
Hi Greg,

I do realise that I hadn't differentiated content that isn't a sentence (or whatever) for the <h3> tags. I gave reasons for that in post #7. However, the result you're after could also be achieved with:
Dim bPara As Boolean
'...
'...
With .Duplicate
i = 0
bPara = False
If InStr(.Text, vbCr) > 0 Then bPara = True
If .Characters.Last.Next = vbCr Then bPara = True
While .Characters.Last.Text Like "[ " & Chr(13) & Chr(11) & Chr(10) & "]"
.End = .End - 1
i = i + 1
Wend
j = Len(.Text)
If .Characters.Last.Text Like "[.!?:;]" Or bPara = False Then

fumei
08-18-2012, 04:17 PM
Or... you could import it into a proper HTML editor.

macropod
08-18-2012, 08:51 PM
And now, for a fresh approach:
Sub Demo()
With ActiveDocument.Content.Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = True
.Font.Bold = True
.Forward = True
.Wrap = wdFindContinue
.MatchWildcards = True
.Text = ""
.Replacement.Text = "<strong>^&</strong>"
.Execute Replace:=wdReplaceAll
.Text = "([!\<])^13([!\>])"
.Replacement.Text = "\1</strong>^p<strong>\2"
.Execute Replace:=wdReplaceAll
.Text = "\<strong\>\</strong\>"
.Replacement.Text = ""
.Execute Replace:=wdReplaceAll
.Format = False
.Text = "(^13)\<strong\>([!.\!\?:;\<]@)\</strong\>(^13)"
.Replacement.Text = "\1<h3>\2</h3>\3"
.Execute Replace:=wdReplaceAll
.Format = True
.Text = "[\<\/]{1,2}[!\>]{2,6}[\>]"
.Replacement.Text = "^&"
.Replacement.Font.Bold = False
.Execute Replace:=wdReplaceAll
End With
End Sub

gmaxey
08-19-2012, 05:58 AM
Paul,

Again, nice code. One small correction. If the first line of the document is a complete paragraph without a peried (i.e., a heading) then using your posted code, you won't get the h3 flags.

Modified like this and you will. I didn't test this any further so it could have holes as well.

Sub Demo()
With ActiveDocument.Content.Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = True
.Font.Bold = True
.Forward = True
.Wrap = wdFindContinue
.MatchWildcards = True
.Text = ""
.Replacement.Text = "<strong>^&</strong>"
.Execute Replace:=wdReplaceAll
.Text = "([!\<])^13([!\>])"
.Replacement.Text = "\1</strong>^p<strong>\2"
.Execute Replace:=wdReplaceAll
.Text = "\<strong\>\</strong\>"
.Replacement.Text = ""
.Execute Replace:=wdReplaceAll
.Format = False
'.Text = "(^13)\<strong\>([!.\!\?:;\<]@)\</strong\>(^13)"
.Text = "\<strong\>([!.\!\?:;\<]@)\</strong\>(^13)"
.Replacement.Text = "<h3>\1</h3>\2"
'.Replacement.Text = "\1<h3>\2</h3>\3"
.Execute Replace:=wdReplaceAll
.Format = True
.Text = "[\<\/]{1,2}[!\>]{2,6}[\>]"
.Replacement.Text = "^&"
.Replacement.Font.Bold = False
.Execute Replace:=wdReplaceAll
End With
End Sub

macropod
08-19-2012, 03:17 PM
One small correction. If the first line of the document is a complete paragraph without a peried (i.e., a heading) then using your posted code, you won't get the h3 flags.
That thought had occurred to me. If it was an issue, I'd thought of simply inserting a paragraph before the text, then removing it afterwards. Sometimes, though, one can't see the wood for the trees. I like your revision better.

PS: For anyone else's benefit, the whole point of the revised code is to do away with the loops used in the previous solutions. Using Word's Find/Replace tools this way, even though it means there's a need for additional Find/Replace expressions, plus replacing some of the things we've replaced in earlier stages, is actually far more efficient than the loops once you get past half a dozen or so bold strings to process.

dutchman191
08-20-2012, 04:15 PM
Thank's for all of the responses. Yes I did not want <h3> tags in the middle of my sentence.. was being a bit slow.

Last response seems to work perfectly.

Thanks again.