PDA

View Full Version : Solved: Find and delete repeated words



johnske
05-29-2005, 04:09 AM
Hi,

In Excel this would be really simple for me, but it's not in Excel, it's in Word, and not being a Word person I wouldn't know where to start.

I have a friend with a 30 page Word doc with a list of words arranged in columns. Each word should only appear in the list once, so, if there is a second, third, ...etc. instance of any word these repetitions has to be deleted.

That's all I know of the problem....

TIA,
John


EDIT: Sorry, there is one other thing I do know, it's case insensitive: i.e. If you're using 'Find' or 'Replace' it doesn't matter if the second instance is in another case - it must be deleted, so it will be MatchCase = False...:hi:

MOS MASTER
05-29-2005, 05:13 AM
Hi John, :yes

Am I correct that I have to search the Word collection from the beginning of the document and that each Word is a Search criteria of its own?

That meaning:
* Find a Word do a find and replace for nothing if more then one of those word exists?

I don't see how that would be easy for you to do in excel but I'm happy to believe you.

It seams like this will be running mucho slow because you have to start the loop over and over until Word doesn't find any repetitions!

This would be so much easier if your friend had an index (in a table) that we could use as search and replace criteria...

O well I'll think this one over and report back later...:whistle:

johnske
05-29-2005, 07:14 AM
Hi Joost,

Yep, you got it right. I don't think speed is really an issue here (as long as it's faster than doing it manually :) ) and it's not going to be used on a regular basis.


I don't see how that would be easy for you to do in excel but I'm happy to believe you
This's basically the idea (V. slow) but it works for xl, I don't know what equivalents for "Cell" and "range" are in WordSub DeleteDuplicates()
Dim Cell As Range, Cel As Range
Dim CurrentCell As String, FirstFound As String
Application.ScreenUpdating = False
For Each Cell In Range("A1:C10")
FirstFound = Cell
CurrentCell = Cell.Address
For Each Cel In Range("A1:C10")
If Cel = FirstFound And Cel.Address <> CurrentCell Then Cel = ""
Next Cel
Next
End SubIt'd be much faster to use it dynamically of course. i.e. when a word is chosen, delete any repeats in what's left of the range... move to the next word and look at what's left of the range and delete any repeats...etc (with the range to be searched becoming smaller and smaller each time)

MOS MASTER
05-29-2005, 07:23 AM
Hi John, :yes

I've not tested your sollution but I could believe it to work for Excel.
Because Excel has a database structure you can easily loop through cell's and ranges as if you where in Access.

Word's a whole different ball game and deals with ranges and collections in an entirely different way.

That said I've just had this brilliant idea of dealing with this question. Before I tell it I've got to test it first. (I think it'll be fast)

Would your friend mind to have all the unique Words in a newly created document?

And if so in how many columns should the words be.

Just for the record h?...the document only contains Words in a column based fashion and no complete sentences? (with al sorts of reading characters?)

O well I'll start on trying my idea first.

Have to stop within a half our though..have to do something else this evening! :whistle:

TonyJollans
05-29-2005, 07:43 AM
I would try using a collection - attempting to add a duplicate raises an error - something like this ..

Dim objWords As New Collection
Dim objWord As Range
On Error GoTo Duplicate
For Each objWord In ActiveDocument.Words
objWords.Add objWord.Text, objWord.Text
GoTo Done
Duplicate:
objWord.Delete
Resume Done
Done:
Next
On Error GoTo 0
Set objWord = Nothing
Set objWords = Nothing

MOS MASTER
05-29-2005, 07:45 AM
I would try using a collection - attempting to add a duplicate raises an error - something like this ..

Dim objWords As New Collection
Dim objWord As Range
On Error GoTo Duplicate
For Each objWord In ActiveDocument.Words
objWords.Add objWord.Text, objWord.Text
GoTo Done
Duplicate:
objWord.Delete
Resume Done
Done:
Next
On Error GoTo 0
Set objWord = Nothing
Set objWords = Nothing
Aaaaaaaaaaaaahhhhh...that's what I was working on! :banghead: :cloud9:

This was how far I came:
Sub FindUnique()
Dim oWords As Word.Range
Dim oCol As Collection
Dim lCnt As Long
Dim sTmp As String
Dim sRep As String
Set oCol = New Collection

On Error Resume Next
For Each oWords In ActiveDocument.Words
sTmp = LCase(Trim(oWords.Text))
If Len(sTmp) > 1 Then
oCol.Add Item:=sTmp, Key:=sTmp
End If
Next

Documents.Add
For lCnt = 1 To oCol.Count
sRep = StrConv(oCol.Item(lCnt), vbProperCase)
ActiveDocument.Range.InsertAfter sRep & vbCr
Next

Set oCol = Nothing
Set oWords = Nothing
End Sub


But this is the fasted method I'm shure! :yes :p

johnske
05-29-2005, 07:50 AM
Ok guys thanx. I will try to get a sample doc tomorrow to try these out...

Regards,
John :)

MOS MASTER
05-29-2005, 07:52 AM
Ok guys thanx. I will try to get a sample doc tomorrow to try these out...

Regards,
John :)
Good idea..till tomorrow! :rofl: :rotlaugh:

TonyJollans
05-29-2005, 08:02 AM
Hi Joost :hi:

Yes, I'm pretty sure that's the right way to do it. You are right to check for Len > 1 (I guess I would've spotted that if I had tested :eek:) and the Trim is good :thumb - but I don't think the LCase is needed (although it does no harm)

MOS MASTER
05-29-2005, 08:12 AM
Hi Joost :hi:

Yes, I'm pretty sure that's the right way to do it. You are right to check for Len > 1 (I guess I would've spotted that if I had tested :eek:) and the Trim is good :thumb - but I don't think the LCase is needed (although it does no harm)

Hi Tony, :yes

I love the way we think! :bow:

I think the LCase construction is needed to cater for the MatchCase:=False in Find...But have not tested it as well so I just put it in there to make sure.

That's why at the end I all put them back to ProperCase because I believe most Words in a column would be written down in PropperCase. (if it's a list)

But will see when John has the sample tomorrow....

I have to work now but for sure will keep on laughing the whole evening because you beat me with having the same idea to cater for this problem.

Het was lachen tot ziens! :hi:

johnske
05-30-2005, 03:15 AM
Hi Joost, Tony,

I see what this is wanted for now! :rofl: :rotlaugh: (shouldn't larf, sum ppl take this really really seriously). Here's part of it...

Tony, nothing happened. Joost, I ended up with an 87 page doc with one word on each line. Also, the headings (1-letter words, 2-letter words, etc.) - being duplicates - were also deleted. I don't think this'd be a problem if the format's retained as they can all be re-headed.

Regards,
John

TonyJollans
05-30-2005, 07:08 AM
Hi John,

I'm a bit surprised you say my code did nothing but it did need a bit of tidying up along the lines Joost proposed (trimming and checking for length > 1).

Having seen the document, I have amended the code to remove commas along with words, and to ignore bold text. It takes a little time to run but here it is:

Sub RemoveDuplicateWords()

Dim objWords As New Collection
Dim objWord As Range
Dim strWord As String

On Error GoTo Duplicate

For Each objWord In ActiveDocument.Words
If Not objWord.Font.Bold Then
strWord = Trim(objWord.Text)
If Len(strWord) > 1 Then objWords.Add strWord, strWord
End If
GoTo Done
Duplicate:
If Trim(objWord.Next(wdWord)) = "," Then objWord.Next(wdWord).Delete
objWord.Delete
Resume Done
Done:
Next

On Error GoTo 0

Set objWord = Nothing
Set objWords = Nothing

End Sub

johnske
05-30-2005, 07:37 AM
Hi Tony,

Thanx for that. Yes, it seems to work fine now. :cloud9:

I added a message box to give a count of deleted (duplicate) words at the end (there were 2000+) then deliberately added duplicates to test it. Gave the correct number of duplicates and deleted them all ok.

(What Happened previously was I got a frozen Word and 'Word not responding')

Many Thanx,
John :thumb

TonyJollans
05-30-2005, 07:45 AM
(What Happened previously was I got a frozen Word and 'Word not responding')

A-ha! I think it was just taking a looooong time deleting all the 'duplicate' commas - one reason for the need to check the length of the 'word'.

Glad it's working for you now.:thumb

MOS MASTER
05-30-2005, 09:44 AM
Hi John, :yes

Glad to see its working! :thumb

johnske
05-30-2005, 01:27 PM
Hi Joost,

Yes, many thanx for your help here also :thumb

Regards,
John

MOS MASTER
05-30-2005, 01:44 PM
Glad I could help! :beerchug:

johnske
05-31-2005, 05:03 AM
Hi Guys,

Just thort I'd let you know: The person requesting this was over the moon at its success. :rofl: :rotlaugh: (Apparently there's some sort of contest going on with some fairly major prizes involved).

I didn't quite follow the rules as explained over the phone, but it seems the idea is that you copy others lists of scrabble words, delete all the duplicates, add them to yours, and then join as many together as you can with the normal scrabble rules for worth of letters applies.

The latest request was - "O.k PC genius if your like and your up for challenge can you get a code that does a count on how many times a letter is use in doc? or am I really pushing boundary?

for example I want to know if a Y is 10 points how many Y's are in whole doc" (the word count currently stands at 10,475)

I naturally said 'of course this can be done'. :devil:

Regards,
John :thumb

TonyJollans
05-31-2005, 05:33 AM
And, of course, it can. :rofl:

This is basic but does work. It only shows lowercase letters (and igores bolded text) as per my memory of the sample.

Dim cc(0 To 255) As Long
For Each c In ActiveDocument.Characters
If Not c.Font.Bold Then cc(Asc(c)) = cc(Asc(c)) + 1
Next
For i = 97 To 122
Msg = Msg & Chr(i) & ":" & vbTab & cc(i) & vbNewLine
Next
MsgBox Msg

johnske
05-31-2005, 05:42 AM
Hi Tony,

Would using Option Compare Text allow for upper & lower-case? Or is that only for Excel?

TonyJollans
05-31-2005, 05:51 AM
No, in this case it wouldn't because it uses ascii codes - this (untried) change should do it though

If Not c.Font.Bold Then cc(Asc(lcase(c))) = cc(Asc(lcase(c))) + 1
A proper solution should also check for other unicode characters. In a rush at present - can look later at making it more robust

johnske
05-31-2005, 06:13 AM
Hi Tony,

Yeah, that mod seems to work fine :thumb

Many Thanx,
John :bow:

TonyJollans
05-31-2005, 12:34 PM
Hi John,

This doesn't really do much the earlier version didn't but it is a bit tidier. It maintains counts of all 'normal' characters, ignoring bolded ones and also ignoring 'non-normal' unicode characters. It then adds the totals of lowercase and uppercase letters and displays them.

Sub TallyCharacters()

Dim arrCharacters(0 To 255) As Long
Dim objChar As Range
Dim iAsciiCode As Integer
Dim lCount As Long
Dim strMessage As String

For Each objChar In ActiveDocument.Characters
If Not objChar.Font.Bold Then
If Asc(objChar.Text) = AscW(objChar.Text) Then
arrCharacters(Asc(objChar.Text)) = arrCharacters(Asc(objChar.Text)) + 1
End If
End If
Next

strMessage = ""
For iAsciiCode = Asc("a") To Asc("z")
lCount = arrCharacters(iAsciiCode) + arrCharacters(iAsciiCode + Asc("A") - Asc("a"))
strMessage = strMessage & Chr(iAsciiCode) & ":" & vbTab & lCount & vbNewLine
Next
MsgBox strMessage

End Sub

MOS MASTER
05-31-2005, 12:42 PM
Hi John & Tony, :yes

Seams I've missed me a challenge.....

Nice coding Tony...:hi:

johnske
05-31-2005, 04:28 PM
Hi Tony,

Thanx, I have 4warded these on, waiting on reply...

Yup Joost, u missed a challenge :rofl: (some ppl just don't realize just wot can be dun with VBA - think it's all about recording macros & running them - they're blown away when you give them something like this to play with)

Regards,
John :hi:

MOS MASTER
06-01-2005, 09:53 AM
(some ppl just don't realize just wot can be dun with VBA - think it's all about recording macros & running them - they're blown away when you give them something like this to play with)

Well I know plenty who even don't know that bit! :rofl: And yeps if they see the power of programming there blown away indeed! :hi:

johnske
06-06-2005, 01:26 AM
Ok, the letter code worked fine Tony, she's V. happy with result.

The problem has changed somewhat now Guys, so will post a new thread here (http://www.vbaexpress.com/forum/showthread.php?t=3510).

Regards,
John :thumb