PDA

View Full Version : Identify text not in Times New Roman Format



vpsekhar
01-05-2015, 11:30 PM
Hi,
I have one word document of more than 400 pages. By using Ctrl + A , changes font to 'Times New Roman'. But unfortunately in some of the areas font is in diffrent format like 'Arial'. Is there any way to find text which is not in 'Times New Roman'.

macropod
01-08-2015, 09:24 PM
You could highlight the whole document, then use Find/Replace to remove the highlighting from the 'Times New Roman' content. Whatever's left with highlighting after that is in a different font.

gmaxey
01-08-2015, 10:20 PM
Paul's suggestion could possibly solve your problem. The only other alternative and the code would take a while to run would be to evaluate ever character in the text. Something like this perhaps:


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim oRng As Word.Range
Application.ScreenUpdating = False
Set oRng = ActiveDocument.Range.Characters.First
Do
If oRng.Font.Name <> "Times New Roman" Then
oRng.Comments.Add oRng, oRng.Font.Name
End If
oRng.MoveStart wdCharacter, 1
oRng.MoveEnd wdCharacter, 1
Loop Until oRng.End = ActiveDocument.Characters.Last.End
Application.ScreenUpdating = True
End Sub

rruckus
01-19-2015, 11:06 AM
This is a more advanced technique, but you can also use the underlying Word XML (ActiveDocument.WordOpenXML) plus DOM and an XPath expression to find all your occurrences of fonts that are not equal to "Times New Roman".

gmaxey
01-20-2015, 07:43 AM
rruckus,

Would you elaborate and post the code that you would use?

rruckus
01-20-2015, 03:42 PM
Sure. First add a Reference to "Microsoft XML, v6.0" in your VBA Project using Tools > References. Then try something like this:



Public Sub FindDisallowedFonts()


Dim oXML As DOMDocument60
Set oXML = New DOMDocument60
oXML.LoadXML ActiveDocument.WordOpenXML
oXML.setProperty "SelectionLanguage", "XPath"
oXML.setProperty "SelectionNamespaces", "xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'"
Dim oDisallowedFontNodes As IXMLDOMNodeList
Set oDisallowedFontNodes = oXML.SelectNodes("//w:rFonts[@w:ascii!='Times New Roman']")
Dim oNode As IXMLDOMNode
For Each oNode In oDisallowedFontNodes
Debug.Print "TEXT: " & oNode.ParentNode.ParentNode.SelectSingleNode(".//w:t").Text
Debug.Print "FONT: " & oNode.ParentNode.ParentNode.SelectSingleNode(".//w:rFonts/@w:ascii").NodeValue
Next oNode


End Sub


This is very rough and would need appropriate error handling, but it does work. You will also sometimes get two font entries in the XML for one occurrence of text (one for the paragraph font and one for the character font). This technique can be combined with traditional "Range.Find" techniques to replace unwanted fonts because you now know the text and the font face style to look for. It can also be used for other information that you cannot find through traditional Word APIs, once you start to understand the underlying XML. XPath is very fast, even for large documents like the one described here (400+pages).

gmaxey
01-20-2015, 06:44 PM
rruckus,

Very interesting. Thanks for sharing. I did get an error when I ran your code on very simple string of text. The error occurred after al the non TNR characters were identified. I don't know what I'm doing, but I fixed it with:


Sub Test()
Dim oXML As DOMDocument60
Set oXML = New DOMDocument60
oXML.LoadXML ActiveDocument.WordOpenXML
oXML.setProperty "SelectionLanguage", "XPath"
oXML.setProperty "SelectionNamespaces", "xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'"
Dim oDisallowedFontNodes As IXMLDOMNodeList
Set oDisallowedFontNodes = oXML.SelectNodes("//w:rFonts[@w:ascii!='Times New Roman']")
Dim oNode As IXMLDOMNode
For Each oNode In oDisallowedFontNodes
'Added the if statement to avoid RTE
If Not oNode.ParentNode.ParentNode.SelectSingleNode(".//w:t") Is Nothing Then
Debug.Print "TEXT: " & oNode.ParentNode.ParentNode.SelectSingleNode(".//w:t").Text
Debug.Print "FONT: " & oNode.ParentNode.ParentNode.SelectSingleNode(".//w:rFonts/@w:ascii").NodeValue
End If
Next oNode
End Sub


Also I noticed that it fails to find "Theme" font. I don't know why and just mentioning it in case you weren't aware.

rruckus
01-21-2015, 09:35 AM
Yes, you are correct, that's why I cautioned it's an advanced technique; you really need to know the underlying Word Open XML to use it. My example was only looking for a font applied to character(s) in a range smaller than a paragraph. It would error out on non-TNR fonts applied to paragraphs and wouldn't find theme/document fonts because they are defined differently in the Word XML. But you could easily expand this example to cover everything you wanted to find. WordOpenXML is a very powerful feature that many developers don't know about or don't know how to use. You have access to lots of things that Word VBA APIs don't expose. Here's the updated code:


Public Sub FindDisallowedFonts()

Dim oXML As DOMDocument60
Set oXML = New DOMDocument60
oXML.LoadXML ActiveDocument.WordOpenXML
oXML.setProperty "SelectionLanguage", "XPath"
oXML.setProperty "SelectionNamespaces", "xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'"
Dim oDisallowedFontNodes As IXMLDOMNodeList
Set oDisallowedFontNodes = oXML.SelectNodes("//w:rFonts[@w:ascii!='Times New Roman']")
Dim oNode As IXMLDOMNode
For Each oNode In oDisallowedFontNodes
If oNode.ParentNode.ParentNode.nodeName = "w:pPr" Then 'We have a paragraph font
Debug.Print "PARA TEXT: " & oNode.ParentNode.ParentNode.ParentNode.SelectSingleNode(".//w:t").Text
Else 'We have a range font
Debug.Print "CHAR TEXT: " & oNode.ParentNode.ParentNode.SelectSingleNode(".//w:t").Text
End If
Debug.Print "FONT: " & oNode.ParentNode.ParentNode.SelectSingleNode(".//w:rFonts/@w:ascii").NodeValue
Next oNode

'Theme Font: e.g. <a:fontScheme name="Office"><a:majorFont><a:latin typeface="Calibri Light"
'Add the "theme" namespace
oXML.setProperty "SelectionNamespaces", "xmlns:a='http://schemas.openxmlformats.org/drawingml/2006/main'"
Debug.Print "Major Theme Font: " & oXML.SelectSingleNode("//a:fontScheme/a:majorFont/a:latin/@typeface").NodeValue
Debug.Print "Minor Theme Font: " & oXML.SelectSingleNode("//a:fontScheme/a:minorFont/a:latin/@typeface").NodeValue


End Sub

gmaxey
01-21-2015, 10:11 AM
rruckus, Thanks. I agree and will have to make a point of learning more about it.