PDA

View Full Version : Solved: Find text in different color



MacroShadow
06-21-2012, 02:39 AM
Hello all,

I have a document which has all quotes in colored text (different colors but not the default color). I'd like to copy all the quotes to a new document (one paragraph per quote), retaining the original formatting of each quote. Additionally I want to add the page number of origin after the quote.

Any ideas and/or help will be appreciated.

Frosty
06-21-2012, 06:14 AM
Are the quotes actually indicated by quotes? Are they separate paragraphs?
I'm not sure if there is a way to use "not" in a find operation... But I'd probably try to sort out how to build a collection of found ranges. And from there, I'd start to tackle the page num problem (whether trying to build a TOC or some other index field to do the page nums, or just manually brute force page info with selection)

Frosty
06-21-2012, 06:19 AM
There might be a few useful techniques in this thread...
http://www.vbaexpress.com/forum/showthread.php?t=42440

MacroShadow
06-21-2012, 06:48 AM
Frosty,

There is nothing indicating that the quotes are actually quotes other than the fact that they are a different color, they may even be only part of a sentence (so your sentence class won't help) and they are never in separate paragraphs.

Frosty
06-21-2012, 06:54 AM
I wasn't thinking of the sentences class, but rather building a collection of found ranges, in the order of the document, but found through various means.

You hve to find multiple colors. That poster had multiple keywords. You both need the output to be in the order They originally appear in the doc.

Does that help?

Frosty
06-21-2012, 06:57 AM
Unless there is a way to search for any text not of automatic or black color? Without a sample doc, it's tough to be more than general. How many different colors are there, etc?

MacroShadow
06-21-2012, 07:40 AM
I think this procedure proves that there is a way to use "not" in a find operation.

.
Sub FindNotBlack()
Dim rDcm As Range
Set rDcm = ActiveDocument.Range
With rDcm.Find
.Text = "<*>"
.MatchWildcards = True
While .Execute
If rDcm.Font.Color <> wdColorBlack Then
' rDcm.Select ' for testing only
rDcm.Font.Color = wdColorBlue
End If
Wend
End With
End Sub


And that being said it is irrelevant how many colors are in the document.
The main problem is the above procedure is very slow.

MacroShadow
06-21-2012, 07:44 AM
And performance wise this would be even worse:


Sub HighlightNotBlack()
Dim char As Range

For Each char In ActiveDocument.Characters
If char.Font.Color <> wdColorAutomatic And char.Font.Color <> wdColorBlack Then
char.HighlightColorIndex = wdYellow
End If
Next
End Sub

Frosty
06-21-2012, 07:47 AM
Well, right. Because you're not using "not" in a find... You're using it as a filter.

I'm not at a computer, but I don't think you can actually perform a find where it returns a "not black" operation (although I seem to recall a "not bold" possibility...

Which means you'll need to search for actual colors. The number of colors does matter... If you are going to use the find object. You really should read that thread. The main concept is the approach... Cycling through all words/sentences/paragraphs in a document and doing analysis... Or using the Find object to do the analysis for you. Find is always much faster

Frosty
06-21-2012, 09:39 AM
For example... this would give you a way to get a series of collections of found ranges, based on a specific color.

From there, you could build a giant collection of ranges and then order them based on the .Start property of the range.


Public Function fGetFoundRanges(rngSearch As Range, lColorIndex As WdColorIndex) As Collection
Dim colRet As Collection

Set colRet = New Collection

With rngSearch.Duplicate.Find
.Font.ColorIndex = lColorIndex
Do Until .Execute = False
colRet.Add rngSearch.Duplicate
Loop
End With

'return the collection
Set fGetFoundRanges = colRet
End Function

The problem with using the above function on something like wdAuto or wdBlack (the only way I could think to somehow build an "anything but black" collection of ranges), would be the inversion of ranges.

This will become relatively easy if you can limit the numbers of colors you're searching for.

Frosty
06-21-2012, 11:08 AM
Whoops, slight mistake in that code...

Public Function fGetFoundRanges(rngSearch As Range, lColorIndex As WdColorIndex) As Collection
Dim colRet As Collection

Set colRet = New Collection

With rngSearch.Find
.Font.ColorIndex = lColorIndex
Do Until .Execute = False
colRet.Add rngSearch.Duplicate
Loop
End With

'return the collection
Set fGetFoundRanges = colRet
End Function

Frosty
06-21-2012, 11:29 AM
And then I thought... what about an inverse collection of found ranges? The problem with this is that it wouldn't deal with all text not Black OR Auto.... you'd have to choose one or the other... but this could work...

Sub DemoSelectARange()
fGetInverseCollection(1).Select
End Sub
'return a collection of ranges *not* defined by the color?
Public Function fGetInverseCollection() As Collection
Dim colRet As Collection
Dim colOriginal As Collection
Dim rngNew As Range
Dim i As Integer

Set colOriginal = fGetFoundRanges(ActiveDocument.Content, wdAuto)
Set colRet = New Collection
For i = 1 To colOriginal.Count - 1
Set rngNew = colOriginal(i).Duplicate
rngNew.Collapse wdCollapseEnd
rngNew.End = colOriginal(i + 1).start
colRet.Add rngNew.Duplicate
Next
Set fGetInverseCollection = colRet
End Function
Public Function fGetFoundRanges(rngSearch As Range, lColorIndex As WdColorIndex) As Collection
Dim colRet As Collection

Set colRet = New Collection

With rngSearch.Find
.Font.ColorIndex = lColorIndex
Do Until .Execute = False
If colRet.Count > 0 Then
If colRet(colRet.Count).End = rngSearch.start Then
colRet(colRet.Count).End = rngSearch.End
Else
colRet.Add rngSearch.Duplicate
End If
Else
colRet.Add rngSearch.Duplicate
End If
Loop
End With

'return the collection
Set fGetFoundRanges = colRet
End Function

Frosty
06-21-2012, 12:19 PM
A few additional changes to the getting of the inverse collection, to deal with documents that start or end with colored text... other two procedures would remain the same.

'return a collection of ranges *not* defined by the color?
Public Function fGetInverseCollection() As Collection
Dim colRet As Collection
Dim colOriginal As Collection
Dim rngNew As Range
Dim i As Integer

Set colOriginal = fGetFoundRanges(ActiveDocument.Content, wdAuto)
Set colRet = New Collection

'first we have to test to see if the first range is at the beginning of the document
'if it isn't, we need to add the first range to our collection
If colOriginal(1).start > ActiveDocument.Content.start Then
Set rngNew = colOriginal(1).Duplicate
rngNew.Collapse wdCollapseStart
rngNew.start = ActiveDocument.Content.start
colRet.Add rngNew.Duplicate
End If
'cycle through the original collection of ranges, building the new collection
'based on that
For i = 1 To colOriginal.Count
'set the initial range
Set rngNew = colOriginal(i).Duplicate
rngNew.Collapse wdCollapseEnd
If i + 1 <= colOriginal.Count Then
rngNew.End = colOriginal(i + 1).start
Else
rngNew.End = ActiveDocument.Content.End
End If
'one last test, to make sure we're not just adding the last paragraph of the document
If Not (rngNew.End = ActiveDocument.Content.End And rngNew.start = rngNew.End - 1) Then
'add our new range to the collection
colRet.Add rngNew.Duplicate
End If
Next
Set fGetInverseCollection = colRet
End Function

Frosty
06-21-2012, 12:20 PM
You'll need to chime in at this point, since I still think it's probably simpler to find a series of collections based on specific colors, rather than try to combine two inverse collections (everything not black and everything not auto)...

MacroShadow
06-21-2012, 01:40 PM
Frosty,

Thank you for your time and effort. I will read the aforementioned thread, but it probably will take some time. I've never used collections before.

In most cases I would imagine black would be the color, but just to be safe I could search for all text with black as filter and if found change it to auto.

As far as the code you posted is concerned, I don't know exactly what to do with it.

Frosty
06-21-2012, 02:54 PM
Ahh, well then forget reading the other thread. Instead, read up on collections in the help file. They can be very useful. Now, your end result here may be tricky, since you may also want to get page numbers.

Without knowing what your original document should look like, and you final document... I can only give you generic code. You could try something like this...

Sub DemoOutputToNewDoc()
Dim oNewDoc As Document
Dim rngColoredText As Range
Dim colColoredText As Collection
Dim rngInsertWhere As Range

'get all the colored text ranges from the currently active document
Set colColoredText = fGetInverseCollection
'create a new document
Set oNewDoc = Documents.Add
'cycle through the collection of ranges (which are in the original document)
For Each rngColoredText In colColoredText
'copy them
rngColoredText.Copy
'set an insertion point range
Set rngInsertWhere = oNewDoc.Content
rngInsertWhere.Collapse wdCollapseEnd
'paste it in
rngInsertWhere.Paste
'insert an extra paragraph mark
rngInsertWhere.InsertAfter vbCr
Next

End Sub
But in terms of getting page numbers... you have to decide whether to
1) try and determine the page number from the given range (which could be tricky)

2) apply a style to the given range in the original document... and then generate a TOC in that original document based only on that style, and then copy the TOC that's generated to the new document as text rather than a TOC field

3) something else...

But you need to give a definitive before and after... or all I can keep giving you is proof-of-concept type stuff.

MacroShadow
06-21-2012, 03:47 PM
Thanks, It works just as expected.

Before or after each quote I would like something along the lines of:
"Page " & Selection.Information(wdActiveEndPageNumber) & "of " _
& Selection.Information(wdNumberOfPagesInDocument)

MacroShadow
06-24-2012, 10:55 PM
Would that method work for getting the page number?

macropod
06-25-2012, 12:20 AM
You could use something like:
Sub ExtractMarkedText()
Application.ScreenUpdating = False
With ActiveDocument.Content
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = False
.Wrap = wdFindStop
.Format = True
.Text = ""
.Font.ColorIndex = wdAuto
.Execute
End With
Do While .Find.Found
.Text = vbTab & "Page: " & .Duplicate.Characters.First.Information(wdActiveEndPageNumber) & vbCr
.Collapse wdCollapseStart
.Find.Execute
Loop
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = True
.Text = "^13^tPage: [0-9]{1,2}"
.Replacement.Text = ""
.Execute Replace:=wdReplaceAll
End With
.Paragraphs.First.Range.Delete
End With
ActiveDocument.Range.Characters.Last.Previous.Delete
Application.ScreenUpdating = True
End Sub
The above code simply deletes whatever doesn't qualify (ie all automatic colour text) and suffixes the 'quotes' with the page references. You can then either save as a new document or copy & paste to another document.

MacroShadow
06-25-2012, 02:32 AM
Thanks Paul, it works well.
I tried to adjust it so the processing is done on a new copy of the active document by adding the following line in the beginning of the macro
Application.Documents.Add ActiveDocument.FullName
but it adds a seemingly random number of "Page: xxx".

Frosty, I'm still interested in you response regarding adding the page number using your suggestion of range collections.

macropod
06-25-2012, 02:48 AM
Is there a reason for not using Save As after processing? Much more efficient.

MacroShadow
06-25-2012, 02:58 AM
I wanted the original document to stay open, save as closes the original.

Anyway, I get the same unexpected outcome using save as.

MacroShadow
06-25-2012, 03:04 AM
Actually, I can't get it to work properly anymore, the first time it did, but now it doesn't.
It seems to add the page number for each paragraph regardless of whether the paragraph has colored text in it.

macropod
06-25-2012, 03:25 AM
It seems to add the page number for each paragraph regardless of whether the paragraph has colored text in it.That is, indeed, part of what it does (but it deletes those paragraphs' content at the same time) - and then it erases the left-over page# paragraphs. Either the errors you're getting result from changes you've made, or there's something in your document I haven't catered for. If you could post a copy, I can do some more testing.

Regarding the Open - vs Save As issue, you could, of course, re-open the original document afterwards. The Save As process is more efficient, though, as you avoid potentially having two large files open at the same time.

MacroShadow
06-25-2012, 04:46 AM
Thanks again for you offer to help.

I don't get it but now it works fine.

Frosty
06-25-2012, 09:45 AM
Thanks for weighing in, Paul... you've got a much simpler concept going on than what I was trying to accomplish.

MacroShadow: as far as the page numbers, I'm not sure if you're still curious about that. But you should look up the concept of pages on this forum. There are a *lot* of posts about it, and there is no really simple answer to this.

If what Paul has done is working for your documents, then I would cross the "pages concept" bridge another day. His code is certainly a simpler and more accessible approach than what I was attempting to do...

- Frosty aka Jason