View Full Version : [SOLVED:] Find unique characters
Programmer_n
02-08-2017, 06:24 PM
I got a document containing lot of mathematical symbols,Greek symbols, numbers and chemical symbols.
I want to extract all unique characters from the document and return it as a separate document.
Is it possible? Just a snippet explaining how to approach this problem would do.
Tommy
02-09-2017, 08:25 AM
Have you tried iterating through each word and sending each character through ASCII() to see if the number is greater than 256? You would need to have the character set handy also just in case the set changes.
gmaxey
02-09-2017, 08:37 AM
Something like this perhaps:
Sub GetSymbols()
Dim oRng As Word.Range
Dim oChr As Range
Dim oCol As New Collection
Dim lngIndex As Long
Dim oTarget As Document
Application.ScreenUpdating = False
For Each oChr In ActiveDocument.Range.Characters
If oChr Like "[!A-Za-z:;l., 0-9]" Then 'etc. don't process
Select Case Asc(oChr)
Case Is = 11, 9, 13, 160 'etc. then don't process these either
Case Else
'Process all others.
On Error Resume Next
oCol.Add oChr, GetSymbolValues(oChr)
On Error GoTo 0
End Select
End If
Next
Set oTarget = Documents.Add
For lngIndex = 1 To oCol.Count
Set oRng = oTarget.Range
oRng.InsertAfter vbCr
oRng.Collapse wdCollapseEnd
oRng.FormattedText = oCol.Item(lngIndex).FormattedText
Next
oTarget.Characters(1).Delete
Application.ScreenUpdating = True
lbl_Exit:
Exit Sub
End Sub
Function GetSymbolValues(oRng) As String
Dim strFont As String
Dim lngANSI As Long, varHex As Variant
oRng.Select
With Selection
With Dialogs(wdDialogInsertSymbol)
strFont = .Font
lngANSI = .CharNum
varHex = Hex(lngANSI)
End With
End With
GetSymbolValues = lngANSI & "|" & strFont
lbl_Exit:
Exit Function
End Function
Tommy
02-09-2017, 11:27 AM
I think that this needs to change to
If oChr Like "[!A-Za-z:;l., 0-9]" Then 'etc. don't process
Select Case Asc(oChr)
Case Is = 11, 9, 13, 160 'etc. then don't process these either
Case Else
'Process all others.
On Error Resume Next
oCol.Add oChr, GetSymbolValues(oChr)
On Error GoTo 0
End Select
End If
This
If oChr Like "[!A-Za-z:;l., 0-9]" Then 'etc. don't process
else
Select Case Asc(oChr)
Case Is = 11, 9, 13, 160 'etc. then don't process these either
Case Else
'Process all others.
On Error Resume Next
oCol.Add oChr, GetSymbolValues(oChr)
On Error GoTo 0
End Select
End If
But that is just me typing out loud.
gmaxey
02-09-2017, 02:16 PM
Tommy, what needs to be changed to what? What change are you suggesting? I don't see it.
Programmer_n
02-09-2017, 06:37 PM
Something like this perhaps:
Sub GetSymbols()
Dim oRng As Word.Range
Dim oChr As Range
Dim oCol As New Collection
Dim lngIndex As Long
Dim oTarget As Document
Application.ScreenUpdating = False
For Each oChr In ActiveDocument.Range.Characters
If oChr Like "[!A-Za-z:;l., 0-9]" Then 'etc. don't process
Select Case Asc(oChr)
Case Is = 11, 9, 13, 160 'etc. then don't process these either
Case Else
'Process all others.
On Error Resume Next
oCol.Add oChr, GetSymbolValues(oChr)
On Error GoTo 0
End Select
End If
Next
Set oTarget = Documents.Add
For lngIndex = 1 To oCol.Count
Set oRng = oTarget.Range
oRng.InsertAfter vbCr
oRng.Collapse wdCollapseEnd
oRng.FormattedText = oCol.Item(lngIndex).FormattedText
Next
oTarget.Characters(1).Delete
Application.ScreenUpdating = True
lbl_Exit:
Exit Sub
End Sub
Function GetSymbolValues(oRng) As String
Dim strFont As String
Dim lngANSI As Long, varHex As Variant
oRng.Select
With Selection
With Dialogs(wdDialogInsertSymbol)
strFont = .Font
lngANSI = .CharNum
varHex = Hex(lngANSI)
End With
End With
GetSymbolValues = lngANSI & "|" & strFont
lbl_Exit:
Exit Function
End Function
Fantastic works like charm. I use chemical symbols like O2 , H2 where 2 is subscript that are not sensed. I understand O2 is sensed as two separate characters, is it possible to sense superscripts/subscripts attached characters as single characters.
gmaxey
02-09-2017, 07:09 PM
Maybe this:
Sub GetSymbols()
Dim oRng As Word.Range
Dim oChr As Range
Dim oCol As New Collection
Dim lngIndex As Long
Dim oTarget As Document
Application.ScreenUpdating = False
For Each oChr In ActiveDocument.Range.Characters
If oChr Like "[!A-Za-z:;., 016-9]" Then 'etc. don't process
Select Case Asc(oChr)
Case Is = 11, 9, 13, 160 'etc. then don't process these either
Case 50 To 53 '2 through 5
If oChr.Font.Subscript = True Or oChr.Font.Superscript = True Then
Set oRng = oChr.Words(1)
If oRng.Characters.Last = " " Then
oRng.End = oRng.End - 1
End If
On Error Resume Next
oCol.Add oRng, Trim(oRng.Text)
On Error GoTo 0
End If
Case Else
'Process all others.
On Error Resume Next
oCol.Add oChr, GetSymbolValues(oChr)
On Error GoTo 0
End Select
End If
Next
Set oTarget = Documents.Add
For lngIndex = 1 To oCol.Count
Set oRng = oTarget.Range
oRng.InsertAfter vbCr
oRng.Collapse wdCollapseEnd
oRng.FormattedText = oCol.Item(lngIndex).FormattedText
Next
oTarget.Characters(1).Delete
Application.ScreenUpdating = True
lbl_Exit:
Exit Sub
End Sub
Function GetSymbolValues(oRng) As String
Dim strFont As String
Dim lngANSI As Long, varHex As Variant
oRng.Select
With Selection
With Dialogs(wdDialogInsertSymbol)
strFont = .Font
lngANSI = .CharNum
varHex = Hex(lngANSI)
End With
End With
GetSymbolValues = lngANSI & "|" & strFont
lbl_Exit:
Exit Function
End Function
Programmer_n
02-09-2017, 07:12 PM
Excellent.
Tommy
02-10-2017, 07:00 AM
Well I thought there needed to be an else in the first if statement, if it works I am incorrect.
gmaxey
02-10-2017, 07:12 AM
Well my notation could have been better. the "!" in the Like statement means process everything except these that follows.
Programmer_n
02-15-2017, 03:10 AM
Well my notation could have been better. the "!" in the Like statement means process everything except these that follows.
Achieving similar results with Regex (Regular Expression). Question is why we don't take the path of Regex while sensing pattern in word?
gmaxey
02-15-2017, 05:52 AM
Speaking only for myself, Regex isn't something that I use very often and understand even less. So seldom worth (for me) taking the time to try to work out. If you have a Regex solution, I would like to see it.
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.