PDA

View Full Version : [SOLVED:] Convert Excel Cell Text in Bold or Italic to HTML Tags/Convert Cell Line Breaks



gmaxey
10-07-2020, 09:03 AM
I have a very large Excel file where several of the columns contain multi-line descriptive data containing mixed formatted text (in addition to the normal font, some formatted with bold and some with italic). I need to replace the formatted "Bold" or "Italic" text in the cells with "<B>text</B> and "<I>text</I>" flags and add paragraph tags <P></P> to the cells replacing the Chr(10) linebreaks.

A real nub with Excel coding and all I could put together (from pecking through examples on the web) follows:

Problem: This is really slow as each character in each cell is evaluated. Is there anything equivalent in Excel to Words Find.Execute command such that I could simply set a range to each Cell then find "BOLD" text and insert the flags before and after each instance of a found instance?

Select the column to process:


Sub ConvertFormatToHTML()
Dim oRng As Range
Dim oCell
Dim lngCount As Long
Dim lngScope As Long
Set oRng = Selection.Columns.Item(1)
lngScope = GetLastRow(oRng)
For Each oCell In oRng.Cells
oCell.Value = AddTags(oCell)
lngCount = lngCount + 1
DoEvents
If lngCount = lngScope Then Exit Sub
Next
End Sub
Function AddTags(ByVal oCell As Range) As String
Dim lngIndex As Long
Dim strResult As String
Dim bIsBold As Boolean, bIsItalic As Boolean
bIsBold = False
bIsItalic = False
For lngIndex = 1 To Len(oCell.Value)
If oCell.Characters(lngIndex, 1).Font.FontStyle = "Bold" Then
If bIsBold = False Then
strResult = strResult + "<B>"
bIsBold = True
End If
Else
If bIsBold = True Then
strResult = strResult + "</B>"
bIsBold = False
End If
End If
If oCell.Characters(lngIndex, 1).Font.FontStyle = "Italic" Then
If bIsItalic = False Then
strResult = strResult + "<I>"
bIsItalic = True
End If
Else
If bIsItalic = True Then
strResult = strResult + "</I>"
bIsItalic = False
End If
End If
strResult = strResult + oCell.Characters(lngIndex, 1).Text
If oCell.Characters(lngIndex, 1).Text = Chr(10) Then
strResult = strResult & "</P><P>"
End If
Next lngIndex
If bIsBold = True Then strResult = strResult + "</B>"
If bIsItalic = True Then strResult = strResult + "</I>"
If Len(strResult) > 0 Then
AddTags = "<P>" & strResult & "</P"
Else
AddTags = vbNullString
End If
AddTags = Replace(AddTags, "<P><P>", "<P>")
AddTags = Replace(AddTags, "</P></P>", "</P>")
AddTags = Replace(AddTags, Chr(10), "")
lbl_Exit:
Exit Function
End Function
Function GetLastRow(oRng As Range) As Long
GetLastRow = Cells(Rows.Count, oRng.Column).End(xlUp).Row
lbl_Exit:
Exit Function
End Function

SamT
10-07-2020, 10:37 AM
The only thing I can think of is to have a Class Object "FontToTag"
The first step would be to trigger F2T.Reset to set its state to null
I see it as checking each character to see if that fontstyle is the same as a previous character and returning the proper tag to Excel for emplacement before that character in the string.

F2T.Reset
If Not {All] fontstyle = "Regular" Then
For each character in String
strResult = F2T(character) & strResult


Your code would have to handle IF lngIndex = 1 then add <P>

Class FontToTag would need to check Fontstyles regular, bold, italic, and bolditalic, with the understanding that changing from B&I to one of the other three only applies a Closing Tab for the one(s) that changed

The only other way I can see to speed this up is to first check the string for any FontStyle NOT regular and only then send the cell for full processing.

gmaxey
10-07-2020, 10:56 AM
Sam,

Thanks for your reply. I'm afraid the class approach is a little over my head. Taking baby steps with Excel:

As for your: "The only other way I can see to speed this up is to first check the string for any FontStyle NOT regular and only then send the cell for full processing."

Sounds promising, but how? Also there is the matter of the line breaks. I think If Instr(oCell.Value, Chr(10)) > 0 would work send to processing any cell with line breaks but how would I determine if a cell was or was not all normal text?

SamT
10-07-2020, 06:16 PM
As a start

Function AddTags(ByVal oCell As Range) As String
Dim lngIndex As Long
Dim strResult As String
Dim bIsBold As Boolean, bIsItalic As Boolean

'Skip any Cells that don't need Tags
If oCell.Font.Regular Then
AddTags = oCell.Value
Exit Function
End if
.
.
.




Just for ideas, I'm pretty sure about Split and Join with the arrays. But the Character Properties has me flummoxed.

Function SamT(oCell As Range) As String
Dim arrLines '() 'As Excel.Characters
Dim arrWords '() 'As Excel.Characters
Dim cWord 'As Excel.Characters

'Check for singletons
If Not InStr(oCell.Text, Chr(10)) Then arrLines = oCell.Text
If Not InStr(oCell.Text, " ") Then cWord = oCell.Text

'Make some decisions

'is not a single line, is not a single word
arrLines = Split(oCell.Text, Chr(10))

'Loop thru all the lines
For i = LBound(arrLines) To UBound(arrLines)
arrWords = Split(arrLines(i), " ")
For j = LBound(arrWords) To UBound(arrWords)
cWord = arrWords(j)
CheckWord:
If cWord.Font.FontStyle = Regular Then GoTo SkipWord
'Else use your tag processing code
'cWord = AddTags(cWord)
SkipWord:
arrWords(j) = cWord
Next j

arrLines(i) = Join(arrWords, " ")
Next i

SamT = Join(arrLines, "</p>chr(10)<p>")
End Function

Paul_Hossler
10-07-2020, 08:32 PM
Edit - I realized I could simplify the code a bit more from my ver3
Edit^2 - cleanup some debug code :(


Try something like this to see if it's faster for you

1. I used .SpecialCells to just get text cells
2. Arrays for speed so I only need to use .Characters as little as possible
3. short circuit if entire cell is bold and/or italic
4. I put the result in the next cell



Option Explicit


Sub ConvertFormatToHTML()
Dim oRng As Range
Dim oCell As Range

ActiveSheet.Columns(1).Select ' testing


Set oRng = Nothing
On Error Resume Next
Set oRng = Selection.Columns(1).SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0

If oRng Is Nothing Then Exit Sub

For Each oCell In oRng.Cells
oCell.Offset(0, 1).Value = AddTags(oCell)
Next
End Sub




Function AddTags(oCell As Range) As String
Dim sResult As String, sChar As String
Dim i As Long, L As Long
Dim bBold As Boolean, bItal As Boolean
Dim A() As String



With oCell
'check the entire cell
If Not .Font.Bold And Not .Font.Italic Then
sResult = .Value

ElseIf .Font.Bold And .Font.Italic Then
sResult = "<b><i>" & .Value & "</i></b>"

ElseIf .Font.Bold And Not .Font.Italic Then
sResult = "<b>" & .Value & "</b>"

ElseIf Not .Font.Bold And .Font.Italic Then
sResult = "<i>" & .Value & "</i>"

Else
L = Len(.Value)

'number of chars
ReDim A(1 To L)

For i = 1 To L
sChar = .Characters(i, 1).Text
bBold = .Characters(i, 1).Font.Bold
bItal = .Characters(i, 1).Font.Italic

If bBold And bItal Then
A(i) = "<b><i>" & sChar & "</i></b>"
ElseIf bBold Then
A(i) = "<b>" & sChar & "</b>"
ElseIf bItal Then
A(i) = "<i>" & sChar & "</i>"
Else
A(i) = sChar
End If
Next i

sResult = Join(A, "")

'InStr faster than Replace if nothing to Replace
If InStr(sResult, "</b><b>") > 0 Then sResult = Replace(sResult, "</b><b>", vbNullString)
If InStr(sResult, "</i><i>") > 0 Then sResult = Replace(sResult, "</i><i>", vbNullString)
If InStr(sResult, "</i></b><b><i>") > 0 Then sResult = Replace(sResult, "</i></b><b><i>", vbNullString)


End If
End With

sResult = "<p>" & sResult & "</p>"

AddTags = Replace(sResult, Chr(10), "</p><p>")
End Function

snb
10-08-2020, 05:14 AM
Not too fast either:


Sub M_snb()
ActiveSheet.UsedRange.Copy
With CreateObject("Word.document")
.Windows(1).Visible = True
.Characters.first.PasteExcelTable 0, 0, 1
.tables(1).converttotext
.Application.Selection.homekey 6
.Application.Selection.collapse
With .Application.Selection.Find
.ClearFormatting
.Font.Italic = 1
.Text = "*"
.Forward = 1
.Wrap = 1
.Format = 1
.MatchWildcards = 1
.Execute
End With

With .Application
Do Until .Selection.Font.Italic = False
.Selection.InsertBefore "<i>"
.Selection.EndOf 1
.Selection.Find.Font.Italic = Not .Selection.Find.Font.Italic
.Selection.Find.Execute
.Selection.InsertAfter "</i>"
.Selection.EndOf 1
.Selection.Find.Font.Italic = Not .Selection.Find.Font.Italic
.Selection.Find.Execute
Loop
End With
sn = Filter(Split(.Content, "i>"), "</")
.Close 0
End With

For j = 0 To UBound(sn)
ActiveSheet.Cells.Replace Replace(sn(j), "</", ""), "<i>" & sn(j) & "i>", 2
sn = Filter(sn, sn(j), 0)
If UBound(sn) = -1 Then Exit For
Next
End Sub

gmaxey
10-08-2020, 09:34 AM
Sam, Paul, snb

Thanks guys for all of your input and assistance. There is no perfect world but I think this will work for me:


Option Explicit
Sub ConvertFormatToHTML()
Dim oRng As Range
Dim oCell
Dim lngCount As Long
Dim lngScope As Long
Set oRng = Selection.Columns.Item(1)
lngScope = GetLastRow(oRng)
For Each oCell In oRng.Cells
If lngCount > 0 Then oCell.Value = AddTags(oCell)
lngCount = lngCount + 1
DoEvents
Debug.Print lngCount
If lngCount = lngScope Then Exit Sub
Next
End Sub
Function AddTags(ByVal oCell As Range) As String
Dim lngIndex As Long
Dim strResult As String
Dim bIsBold As Boolean, bIsItalic As Boolean, bIsBoldItalic As Boolean
bIsBold = False
bIsItalic = False
bIsBoldItalic = False
Select Case oCell.Font.FontStyle
Case Is = "Regular": strResult = oCell.Value
Case Is = "Bold"
strResult = "<B>" & oCell.Value & "</B>"
oCell.Font.FontStyle = "Regular"
Case Is = "Italic"
strResult = "<i>" & oCell.Value & "</i>"
oCell.Font.FontStyle = "Regular"
Case Is = "Bold Italic"
strResult = "<B><i>" & oCell.Value & "</i></B>"
oCell.Font.FontStyle = "Regular"
Case Else
For lngIndex = 1 To Len(oCell.Value)
If oCell.Characters(lngIndex, 1).Font.FontStyle = "Bold Italic" Then
If bIsBoldItalic = False Then
strResult = strResult + "<i><B>"
bIsBoldItalic = True
End If
Else
If bIsBoldItalic = True Then
strResult = strResult + "</i></B>"
bIsBoldItalic = False
End If
End If
If oCell.Characters(lngIndex, 1).Font.FontStyle = "Bold" Then
If bIsBold = False Then
strResult = strResult + "<B>"
bIsBold = True
End If
Else
If bIsBold = True Then
strResult = strResult + "</B>"
bIsBold = False
End If
End If
If oCell.Characters(lngIndex, 1).Font.FontStyle = "Italic" Then
If bIsItalic = False Then
strResult = strResult + "<i>"
bIsItalic = True
End If
Else
If bIsItalic = True Then
strResult = strResult + "</i>"
bIsItalic = False
End If
End If
strResult = strResult + oCell.Characters(lngIndex, 1).Text

Next lngIndex
If bIsBold = True Then strResult = strResult + "</B>"
If bIsItalic = True Then strResult = strResult + "</i>"
If bIsBoldItalic = True Then strResult = strResult + "</i></B>"
End Select
If InStr(strResult, Chr(10)) > 0 Then
strResult = Replace(strResult, Chr(10), "<P></P>")
End If
If Len(strResult) > 0 Then
AddTags = "<P>" & strResult & "</P>"
Else
AddTags = vbNullString
End If
AddTags = Replace(AddTags, "<P><P>", "<P>")
AddTags = Replace(AddTags, "</P></P>", "</P>")
AddTags = Replace(AddTags, Chr(10), "")
lbl_Exit:
Exit Function
End Function
Function GetLastRow(oRng As Range) As Long
GetLastRow = Cells(Rows.Count, oRng.Column).End(xlUp).Row
lbl_Exit:
Exit Function
End Function

Paul_Hossler
10-08-2020, 12:26 PM
OK, but I have to say that it doesn't look very efficient

Not sure it'll work on strings like: nnnBBBBXXXXIIIInnnnBBBB when XXXX = Bold Italic, n = none, B = bold and I = italic

gmaxey
10-08-2020, 05:04 PM
Paul,

You are right. It wouldn't. I could be missing the tree for the forest, but I don't see how to avoid the For Each Character loops when the cell contains mixed formatting

Feel free to "Give me a fish" here if you wish. I don't want the content to replace the existing content. I'll attach a sample file.




Option Explicit
Sub ConvertFormatToHTML()
Dim oRng As Range
Dim oCell
Dim lngCount As Long
Dim lngScope As Long
Set oRng = Selection.Columns.Item(1)
lngScope = GetLastRow(oRng)
For Each oCell In oRng.Cells
If lngCount > 0 Then oCell.Value = AddTags(oCell)
lngCount = lngCount + 1
DoEvents
Debug.Print lngCount
If lngCount = lngScope Then Exit Sub
Next
End Sub
Function AddTags(ByVal oCell As Range) As String
Dim lngIndex As Long
Dim strResult As String
Dim bIsBold As Boolean, bIsItalic As Boolean, bIsBoldItalic As Boolean
bIsBold = False
bIsItalic = False
bIsBoldItalic = False
Select Case oCell.Font.FontStyle
Case Is = "Regular": strResult = oCell.Value
Case Is = "Bold"
strResult = "<B>" & oCell.Value & "</B>"
oCell.Font.FontStyle = "Regular"
Case Is = "Italic"
strResult = "<i>" & oCell.Value & "</i>"
oCell.Font.FontStyle = "Regular"
Case Is = "Bold Italic"
strResult = "<B><i>" & oCell.Value & "</i></B>"
oCell.Font.FontStyle = "Regular"
Case Else
For lngIndex = 1 To Len(oCell.Value)
Select Case oCell.Characters(lngIndex, 1).Font.FontStyle
Case Is = "Bold Italic"
If bIsBold = True Then strResult = strResult + "</B>": bIsBold = False
If bIsItalic = True Then strResult = strResult + "</i>": bIsItalic = False
If bIsBoldItalic = False Then
strResult = strResult + "<B><i>"
bIsBoldItalic = True
End If
Case Is = "Bold"
If bIsBoldItalic = True Then strResult = strResult + "</i></B>": bIsBoldItalic = False
If bIsItalic = True Then strResult = strResult + "</i>": bIsItalic = False
If bIsBold = False Then
strResult = strResult + "<B>"
bIsBold = True
End If
Case Is = "Italic"
If bIsBold = True Then strResult = strResult + "</B>": bIsBold = False
If bIsBoldItalic = True Then strResult = strResult + "</i></B>": bIsBoldItalic = False
If bIsItalic = False Then
strResult = strResult + "<i>"
bIsItalic = True
End If
Case Else
If bIsBold = True Then strResult = strResult + "</B>": bIsBold = False
If bIsItalic = True Then strResult = strResult + "</i>": bIsItalic = False
If bIsBoldItalic = True Then strResult = strResult + "</i></B>": bIsBoldItalic = False
End Select
strResult = strResult + oCell.Characters(lngIndex, 1).Text
Next lngIndex
If bIsBold = True Then strResult = strResult + "</B>": bIsBold = False
If bIsItalic = True Then strResult = strResult + "</i>": bIsItalic = False
If bIsBoldItalic = True Then strResult = strResult + "</i></B>": bIsBoldItalic = False
End Select
If InStr(strResult, Chr(10)) > 0 Then
strResult = Replace(strResult, Chr(10), "<P></P>")
End If
If Len(strResult) > 0 Then
AddTags = "<P>" & strResult & "</P>"
Else
AddTags = vbNullString
End If
AddTags = Replace(AddTags, "<P><P>", "<P>")
AddTags = Replace(AddTags, "</P></P>", "</P>")
AddTags = Replace(AddTags, Chr(10), "")
lbl_Exit:
Exit Function
End Function
Function GetLastRow(oRng As Range) As Long
GetLastRow = Cells(Rows.Count, oRng.Column).End(xlUp).Row
lbl_Exit:
Exit Function
End Function
27285

gmaxey
10-09-2020, 06:24 AM
Paul,

Was in a rush yesterday to get something done and didn't have the time to full review (or appreciate) what you had suggested. I revised it a little to better suit my style and eliminated some of the boolean variables. It seems to work very well with your samples and my text as well. Thank you.


Sub ConvertFormatToHTML()
Dim oRng As Range
Dim oCell As Range
ActiveSheet.Columns(1).Select
Set oRng = Nothing
On Error Resume Next
Set oRng = Selection.Columns(1).SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0
If oRng Is Nothing Then Exit Sub
For Each oCell In oRng.Cells
'oCell.Offset(0, 1).Value = AddTags(oCell)
oCell.Value = AddTags(oCell)
Next
lbl_Exit:
Exit Sub
End Sub

Function AddTags(oCell As Range) As String
Dim strResult As String, strChar As String
Dim lngIndex As Long, lngLen As Long
Dim arrChrs() As String
With oCell
'Check the entire cell
If Not .Font.FontStyle = "Bold" And Not .Font.FontStyle = "Italic" And Not .Font.FontStyle = "Bold Italic" Then
strResult = .Value
ElseIf .Font.FontStyle = "Bold Italic" Then
strResult = "<b><i>" & .Value & "</i></b>"
ElseIf .Font.FontStyle = "Bold" Then
strResult = "<b>" & .Value & "</b>"
ElseIf .Font.FontStyle = "Italic" Then
strResult = "<i>" & .Value & "</i>"
Else
lngLen = Len(.Value)
'Number of chars
ReDim arrChrs(1 To lngLen)
For lngIndex = 1 To lngLen
strChar = .Characters(lngIndex, 1).Text
Select Case .Characters(lngIndex, 1).Font.FontStyle
Case "Bold Italic"
arrChrs(lngIndex) = "<b><i>" & strChar & "</i></b>"
Case "Bold"
arrChrs(lngIndex) = "<b>" & strChar & "</b>"
Case "Italic"
arrChrs(lngIndex) = "<i>" & strChar & "</i>"
Case Else
arrChrs(lngIndex) = strChar
End Select
Next lngIndex
strResult = Join(arrChrs, "")
End If
'InStr faster than Replace if nothing to Replace
If InStr(strResult, "</b><b>") > 0 Then strResult = Replace(strResult, "</b><b>", vbNullString)
If InStr(strResult, "</i><i>") > 0 Then strResult = Replace(strResult, "</i><i>", vbNullString)
If InStr(strResult, "</i></b><b><i>") > 0 Then strResult = Replace(strResult, "</i></b><b><i>", vbNullString)
End With
strResult = "<p>" & strResult & "</p>"
AddTags = Replace(strResult, Chr(10), "</p><p>")
lbl_Exit:
Exit Function
End Function

snb
10-09-2020, 08:27 AM
@greg

Reduce


ActiveSheet.Columns(1).Select
Set oRng = Nothing
On Error Resume Next
Set oRng = Selection.Columns(1).SpecialCells(xlCellTypeConstants, xlTextValues)
to

On Error Resume Next
Set oRng = activesheet.Columns(1).SpecialCells(2,2)

Paul_Hossler
10-09-2020, 08:41 AM
1. It takes time to read a property from an object so I used the booleans to hold the result if I needed it again (you did emphasize the need for speed), or just for readability (my style)

2. Something like .Font.FontStyle = "Bold" is MS Word oriented (I think) and involves two object levels and a string compare to return a T/F. Something like .Font.Bold returns T/F directly, so I thought it'd be a tad faster


3. There's been many discussions here in Excel-land about using arrays to improve speed since directly reading from the worksheet N times to manipulate data one cell at a time is a lot slower than bringing the N cells into an array all at once, crunch the array in memory, and then put all N cells back to the worksheet all at once. Of course, all that the array has are Values, no formatting. Join-ing the pieces into a single String for Replace is probably the equivalent


4.

Followup to snb's comment

a. You do not need to .Select something to use or act on it (MS Word is very 'Selection' object oriented)
b. I captured the returned Set of the .SpecialCells so that I could exit neatly if by chance there were no text cells
c. My style is to use enumerations instead of the numerical values for readability. The computer doesn't care, but I do when reading the code
d. If you were making this a general purpose sub, then you could just use



If Not TypeOf Selection Is Range Then Exit Sub
Set oRng = Nothing
On Error Resume Next
Set oRng = Selection.SpecialCells(xlCellTypeConstants, xlTextValues)
If oRng Is Nothing Then Exit Sub



5. You could make AddTags into a UserDefinedFunction and use it directly in cells on the worksheet, or call it inside a macro to do something with the tagged text

:rotlaugh::rotlaugh::rotlaugh:


(Still in Hossler-style :*) )



Option Explicit


Sub TagActiveCell()
MsgBox AddTags(ActiveCell)
End Sub




Function AddTags(oCell As Range) As String
Dim sResult As String, sChar As String
Dim i As Long, L As Long
Dim bBold As Boolean, bItal As Boolean
Dim A() As String


AddTags = vbNullString

With oCell
'check the entire cell
If Len(Trim(.Value)) = 0 Then Exit Function

If Not .Font.Bold And Not .Font.Italic Then
sResult = .Value

ElseIf .Font.Bold And .Font.Italic Then
sResult = "<b><i>" & .Value & "</i></b>"

ElseIf .Font.Bold And Not .Font.Italic Then
sResult = "<b>" & .Value & "</b>"

ElseIf Not .Font.Bold And .Font.Italic Then
sResult = "<i>" & .Value & "</i>"

Else
L = Len(.Value)

'number of chars
ReDim A(1 To L)

For i = 1 To L
sChar = .Characters(i, 1).Text
bBold = .Characters(i, 1).Font.Bold
bItal = .Characters(i, 1).Font.Italic

If bBold And bItal Then
A(i) = "<b><i>" & sChar & "</i></b>"
ElseIf bBold Then
A(i) = "<b>" & sChar & "</b>"
ElseIf bItal Then
A(i) = "<i>" & sChar & "</i>"
Else
A(i) = sChar
End If
Next i

sResult = Join(A, "")

'InStr faster than Replace if nothing to Replace
If InStr(sResult, "</b><b>") > 0 Then sResult = Replace(sResult, "</b><b>", vbNullString)
If InStr(sResult, "</i><i>") > 0 Then sResult = Replace(sResult, "</i><i>", vbNullString)
If InStr(sResult, "</i></b><b><i>") > 0 Then sResult = Replace(sResult, "</i></b><b><i>", vbNullString)
End If
End With

sResult = "<p>" & sResult & "</p>"

AddTags = Replace(sResult, Chr(10), "</p><p>")
End Function

gmaxey
10-09-2020, 09:11 AM
Paul,

Good points. I've revised as follows:


Sub ConvertFormatToHTML()
Dim oRng As Range
Dim oCell As Range
ActiveSheet.Columns(1).Select
Set oRng = Nothing
On Error Resume Next
Set oRng = Selection.Columns(1).SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0
If oRng Is Nothing Then Exit Sub
For Each oCell In oRng.Cells
oCell.Offset(0, 1).Value = AddTags(oCell)
'oCell.Value = AddTags(oCell)
Next
lbl_Exit:
Exit Sub
End Sub
Function AddTags(oCell As Range) As String
Dim strResult As String, strChar As String
Dim lngIndex As Long, lngLen As Long
Dim arrChrs() As String
Dim bBold As Boolean, bItalic As Boolean
With oCell
'Check the entire cell
If Not .Font.Bold = True And Not .Font.Italic = True Then
strResult = .Value
ElseIf .Font.Bold = True And .Font.Italic = True Then
strResult = "<b><i>" & .Value & "</i></b>"
ElseIf .Font.FontStyle = "Bold" Then
strResult = "<b>" & .Value & "</b>"
ElseIf .Font.FontStyle = "Italic" Then
strResult = "<i>" & .Value & "</i>"
Else
lngLen = Len(.Value)
'Number of chars
ReDim arrChrs(1 To lngLen)
For lngIndex = 1 To lngLen
strChar = .Characters(lngIndex, 1).Text
bBold = .Characters(lngIndex, 1).Font.Bold
bItalic = .Characters(lngIndex, 1).Font.Italic
Select Case True
Case bBold And bItalic: arrChrs(lngIndex) = "<b><i>" & strChar & "</i></b>"
Case bBold: arrChrs(lngIndex) = "<b>" & strChar & "</b>"
Case bItalic: arrChrs(lngIndex) = "<i>" & strChar & "</i>"
Case Else: arrChrs(lngIndex) = strChar
End Select
Next lngIndex
strResult = Join(arrChrs, "")
End If
'InStr faster than Replace if nothing to Replace
If InStr(strResult, "</b><b>") > 0 Then strResult = Replace(strResult, "</b><b>", vbNullString)
If InStr(strResult, "</i><i>") > 0 Then strResult = Replace(strResult, "</i><i>", vbNullString)
If InStr(strResult, "</i></b><b><i>") > 0 Then strResult = Replace(strResult, "</i></b><b><i>", vbNullString)
End With
strResult = "<p>" & strResult & "</p>"
AddTags = Replace(strResult, Chr(10), "</p><p>")
lbl_Exit:
Exit Function
End Function

Paul_Hossler
10-09-2020, 09:41 AM
NOT a big deal (style comment), but .Font. ... is a Boolean so this




If Not .Font.Bold = True And Not .Font.Italic = True Then



could be shortened to

1.



If .Font.Bold = False And .Font.Italic = False Then



or

2.



If Not .Font.Bold And Not .Font.Italic Then



or

3.



If Not (.Font.Bold Or .Font.Italic) Then


Personally, I find #2 more readable (but as I said, it's my style and I'm comfortable / used to it)

SamT
10-09-2020, 09:46 AM
Example: "Cat Dog Fox"

Excel: Cat.FontStlye = Bold; a.Fontstyle = Bold; c.Fontstyle = Bold; " ".Fontstyle = Regular; D.Fontstyle = BoldItalic; o.Fontstyle = BoldItalic; etc for each character

By default, each new character's Font Object inherits the fontstyle of the preceding character. A Fontstyle 'ends' when a new FontStyle is initiated

Cell.FontStyles provides the default character Fontstyle

Html: <b>Cat <i>Cat</i></b> Dog
Font tags Encapsulate Strings, Encapsulated Tags are additive. A Font 'ends' when the closing Tag is presented.

Encapsulating each character is allowed, <b>C</b><b>a</b><b>t</b> <b><i>D</i></b><b><i>o</i></b><b><i>g</i></b>, etc, and HTMLTidy should clean it up. I would start and end the HTM string(entire cell) with the Tag for the Cell's FontStyle

Personally, I would save all output to a *.htm file, (a basic text file that starts with "<html><body>" & NewLine + results,) and feed that to HTMLTidy.

mana
10-10-2020, 12:00 AM
Option Explicit

Sub test()
Dim re1 As Object
Dim re2 As Object
Dim s As String
Dim r As Range
Dim m, k

Set r = Selection
s = r.Value(xlRangeValueXMLSpreadsheet)

Set re1 = CreateObject("VBScript.RegExp")
re1.Global = True
Set re2 = CreateObject("VBScript.RegExp")
re2.Global = True
'
re1.Pattern = "<B>(<Font[\s\S]*?>)([\s\S]*?)(</Font>)*</B>"
s = re1.Replace(s, "$1&lt;B&gt;$2&lt;/B&gt;$3")

re1.Pattern = "<I>(<Font[\s\S]*?>)([\s\S]*?)(</Font>)*</I>"
s = re1.Replace(s, "$1&lt;I&gt;$2&lt;/I&gt;$3")

re2.Pattern = "<Style ss:(ID=""s\d*"")>[\n\s]*?<Font[^>]*?ss:Bold=""1"""
If re2.test(s) Then
For Each m In re2.Execute(s)
re1.Pattern = "(<Cell ss:Style" & m.submatches(0) & "><Data ss:Type=""String"">)([\s\S]*?)(</Data>)"
s = re1.Replace(s, "$1&lt;B&gt;$2&lt;/B&gt;$3")
Next
End If

re2.Pattern = "<Style ss:(ID=""s\d*"")>[\n\s]*?<Font[^>]*?ss:Italic=""1"""
If re2.test(s) Then
For Each m In re2.Execute(s)
re1.Pattern = "(<Cell ss:Style" & m.submatches(0) & "><Data ss:Type=""String"">)([\s\S]*?)(</Data>)"
s = re1.Replace(s, "$1&lt;I&gt;$2&lt;/I&gt;$3")
Next
End If

re1.Pattern = "((<ss:|<)Data[\s\S]*?>)([\s\S]*?)(<(/ss:|/)Data>)"
s = re1.Replace(s, "$1&lt;P&gt;$3&lt;/P&gt;$4")

re1.Pattern = "
"
s = re1.Replace(s, "&lt;P&gt;&lt;/P&gt;")

r.Offset(, 2).Value(xlRangeValueXMLSpreadsheet) = s
r.Offset(, 2).ClearFormats
r.Offset(, 2).Font.Bold = False
r.Offset(, 2).Font.Italic = False
'
Set re1 = Nothing
Set re2 = Nothing

End Sub



マナ

mana
10-10-2020, 12:12 AM
re1.Pattern = "
"
s = re1.Replace(s, "&lt;P&gt;&lt;/P&gt;")


re1.Pattern = "& # 10 ;" <---Remove space
s = re1.Replace(s, "&lt;P&gt;&lt;/P&gt;")

gmaxey
10-10-2020, 05:01 AM
Mana,

Very interesting and fast. However, at this point, it is not dealing with FontSytle "Bold Italic"

This Bold&Italic <B><i>Bold&Italic</B></i>
and this Bold&ItalicBold should end up <B><i>Bold&Italic</i>Bold</B>

Paul_Hossler
10-10-2020, 07:59 AM
@mana

that xlRangeValueXMLSpreadsheet is interesting. Never saw that before

snb
10-10-2020, 10:01 AM
That makes 2.

mana
10-10-2020, 07:49 PM
Option Explicit

Sub test2()
Dim re1 As Object, re2 As Object
Dim s As String
Dim r As Range
Dim m As Object

Set r = Selection
s = r.Value(xlRangeValueXMLSpreadsheet)

Set re1 = CreateObject("VBScript.RegExp")
re1.Global = True
Set re2 = CreateObject("VBScript.RegExp")
re2.Global = True

re1.Pattern = "<I>(<Font[\s\S]*?>)([\s\S]*?)(</Font>)*</I>"
s = re1.Replace(s, "$1&lt;I&gt;$2&lt;/I&gt;$3")

re1.Pattern = "<B>(<Font[\s\S]*?(?!Style)>)([\s\S]*?)(</Font>)*</B>"
s = re1.Replace(s, "$1&lt;B&gt;$2&lt;/B&gt;$3")

re2.Pattern = "<Style ss:(ID=""s\d*"")>([\s\S]*?)</Style>"
If re2.test(s) Then
For Each m In re2.Execute(s)
If InStr(m.submatches(1), "ss:Bold=""1""") > 0 Then
re1.Pattern = "(<Cell ss:Style" & m.submatches(0) & "><Data ss:Type=""String"">)([\s\S]*?)(</Data>)"
s = re1.Replace(s, "$1&lt;B&gt;$2&lt;/B&gt;$3")
End If
If InStr(m.submatches(1), "ss:Italic=""1""") > 0 Then
re1.Pattern = "(<Cell ss:Style" & m.submatches(0) & "><Data ss:Type=""String"">)([\s\S]*?)(</Data>)"
s = re1.Replace(s, "$1&lt;I&gt;$2&lt;/I&gt;$3")
End If
Next
End If
'
re1.Pattern = "((<ss:|<)Data[\s\S]*?>)([\s\S]*?)(<(/ss:|/)Data>)"
s = re1.Replace(s, "$1&lt;P&gt;$3&lt;/P&gt;$4")

re1.Pattern = "& # 10;" '<--remove space
s = re1.Replace(s, "&lt;P&gt;&lt;/P&gt;")

r.Offset(, 2).Value(xlRangeValueXMLSpreadsheet) = s
r.Offset(, 2).ClearFormats
r.Offset(, 2).Font.Bold = False
r.Offset(, 2).Font.Italic = False
'
Set re1 = Nothing
Set re2 = Nothing

End Sub


マナ

snb
10-11-2020, 04:09 AM
Or:


Sub M_snb()
sp = Range("A1:B20")
sn = Filter(Split(Replace(Replace(Range("A1:D20").Value(11), "</Cell>", "<Cell>"), "</Font>", ""), "<Cell>"), "<Font")

For j = 1 To UBound(sn)
If InStr(sn(j), "<I>") + InStr(sn(j), "<B>") Then
sn(j) = Replace(Replace(Replace(Replace(Replace(sn(j), "<I>", Chr(0) & "I" & Chr(1)), "<B>", Chr(0) & "B" & Chr(1)), "</I>", Chr(0) & "/I" & Chr(1)), "</B>", Chr(0) & "/B" & Chr(1)), ">", "<")
sp((j - 1) \ UBound(sp, 2) + 1, (j - 1) Mod UBound(sp, 2) + 1) = Replace(Replace(Join(Filter(Split(sn(j), "<"), ":", 0), ""), Chr(0), "<"), Chr(1), ">")
End If
Next

Range("A1:B20").Offset(25) = sp
End Sub

gmaxey
10-11-2020, 07:20 AM
Very nice. Thanks mana

gmaxey
10-11-2020, 07:24 AM
snb, Your code throws a type mismatch error. Since it has nothing to do with converting the content of the selected range or column, why do you even bother?

Paul_Hossler
10-11-2020, 07:35 AM
Very nice. Thanks mana

Agreed - I learned a little something

However, I don't think I could get all the parameters, etc. correct. I'll have to opt for simplicity over efficiency :(

gmaxey
10-11-2020, 07:48 AM
Paul,

Likewise. This is a case where I was given the fish. A "Deep Sea" fish. I better stay closer to the shore :-). Thanks all of you for your help.

gmaxey
10-11-2020, 10:42 AM
Sorry guys, one more question. In mana's code he sets his range to the the selection.

For my purposes, I need to select a column of data and run the code. I don't want to process the first heading row. How do I adapt:

Set r = Selection

such that the range excludes the first row.

I tried this (among other things) but get and error:

Set r = Selection.Offset(1, 0)

Thanks.

Paul_Hossler
10-11-2020, 01:37 PM
In Excel, ActiveCell is used a lot

One if the data has no blanks, another if it does

Of course, you could use the second for everything, but I thought I'd show two different ways

This could stand some error checking, but we'll leave that as a homework assignment :devil2:




Option Explicit


'click somewhere in Col A
Sub Fragment()
Dim rngToProcess As Range, rStart As Range, rEnd As Range

Set rStart = ActiveCell.EntireColumn.Cells(2, 1)
Set rEnd = rStart.End(xlDown)

Set rngToProcess = Range(rStart, rEnd)

On Error Resume Next
Set rngToProcess = rngToProcess.SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0

MsgBox rngToProcess.Address


End Sub




'click somewhere in Col C
Sub Fragment2()
Dim rngToProcess As Range, rStart As Range, rEnd As Range

Set rStart = ActiveCell.EntireColumn.Cells(2, 1)
Set rEnd = ActiveSheet.Cells(ActiveSheet.Rows.Count, ActiveCell.Column).End(xlUp)

Set rngToProcess = Range(rStart, rEnd)

On Error Resume Next
Set rngToProcess = rngToProcess.SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0

MsgBox rngToProcess.Address


End Sub

gmaxey
10-11-2020, 02:13 PM
Thanks Pual

rambleon
02-06-2021, 07:18 AM
Hi Mana,
In addition to converting Bold and Italic text to web format - which is an excellent piece code, would it be possible to also include conversion of special characters for web, such as ampersand (&), quotes (") and apostrophe (').
Thanks

snb
02-06-2021, 09:50 AM
@gmax

To exclude columnheaders:


Sub M_snb()
sn=cells(1).currentregion.columns(3).offset(1).specialcells(2)
End Sub

rambleon
02-07-2021, 01:08 AM
Hi Mana,
This code was working but now it just shows [Running] then Not Responding.
I'm selecting the column to convert on a sheet with 2,200 rows.
Any ideas?
Thanks

rambleon
02-07-2021, 02:20 AM
I went back to an earlier version of the file, and your code to convert selected column to html worked fine, so problem was mine.
Sorry,
Thanks