PDA

View Full Version : Merging/deleting duplicate pages?



CaptainCsaba
04-03-2018, 11:19 PM
Hey!

I know that word does not really "understand" how pages work, but lets say that at the end of every page there is a page break. If this is true then is there a way for Word VBA to detect and delete duplicate pages? So for example if there are 10 pages and 3 pages are the same out of that 10 then in the end there would only be 8 because only one remained of the duplicate pages (2 got deleted out of that 3)?

CaptainCsaba
04-06-2018, 12:46 AM
Nobody has an idea?

gmaxey
04-06-2018, 05:53 AM
Very crude because of the reasons you indicate, but if the page is simple text then something like this may get you close:


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey, http://gregmaxey.com/word_tips.html, 4/6/2018
Dim oPage As Page, oPagePrevious As Page
Dim oRngP As Range, oRngPP As Range
Dim lngIndex As Long
For lngIndex = ActiveWindow.ActivePane.Pages.Count To 2 Step -1
Set oPage = ActiveWindow.ActivePane.Pages(lngIndex)
Set oPagePrevious = ActiveWindow.ActivePane.Pages(lngIndex - 1)
Set oRngP = oPage.Rectangles(1).Range
Set oRngPP = oPagePrevious.Rectangles(1).Range

If Asc(oRngPP.Characters.Last.Previous) = 12 Then
oRngPP.End = oRngPP.End - 2
Else
oRngPP.End = oRngPP.End - 1
End If
If Asc(oRngP.Characters.Last.Previous) = 12 Then
oRngP.End = oRngP.End - 2
Else
oRngP.End = oRngP.End - 1
End If
If oRngP.Text = oRngPP.Text Then
oPage.Rectangles(1).Range.Delete
End If
Next
lbl_Exit:
Exit Sub
End Sub

CaptainCsaba
04-06-2018, 06:04 AM
The code seems good but i get an error unfortunately: "Run-time error 91: Object variable or With block variable not set"

The error occurs after the line:
If Asc(oRngPP.Characters.Last.Previous) = 12 Then

gmaxey
04-06-2018, 06:23 AM
I tested with a very simple text document AAA typed at the top of three pages separate with a page break. I believe that would only occur in the event you had a single character in the rectangle (e.g., a blank page containing only the page break).

gmaxey
04-06-2018, 06:25 AM
Here I can replicate that error if the first page contains only a page break (no other text or non-printing characters)

CaptainCsaba
04-06-2018, 06:35 AM
I checked and there are no empty pages with page breaks. Can I send you the word file I am trying it on?

macropod
04-06-2018, 06:45 AM
It's not that Word doesn't understand what a page is, but that the concept is fluid. That's because Word uses the active printer driver to optimise the page layout and that can result in the pagination varying from one computer to the next, depending on what the active printer driver is.

As for your problem, for the description given, that would require a macro that employs a loop to compare ranges bounded by the hard page breaks against each other. One issue you haven't addressed is which of the duplicates you want to retain.

CaptainCsaba
04-06-2018, 12:18 PM
It's not that Word doesn't understand what a page is, but that the concept is fluid. That's because Word uses the active printer driver to optimise the page layout and that can result in the pagination varying from one computer to the next, depending on what the active printer driver is.

As for your problem, for the description given, that would require a macro that employs a loop to compare ranges bounded by the hard page breaks against each other. One issue you haven't addressed is which of the duplicates you want to retain.

By which one do you mean which page from pages that are the same? It could be any of them. If there are 20 of the same pages it does-n't really matter which one remains as long as the other 19 are gone.

By the way is making a loop to compare the hard page breaks (by that you mean the ones which are shown in word as in the printing preview?) possible? Isn't it easier just to use the regular page break?

macropod
04-06-2018, 02:05 PM
The hard page breaks are the ones you appear to be saying in post #1 you've created:

at the end of every page there is a page break
That's what the:
If Asc(oRngPP.Characters.Last.Previous) = 12 Then
refers to; it could also be expressed as:
If oRngPP.Characters.Last.Previous = Chr(12) Then
Greg's macro uses a loop to find such page breaks.

gmaxey
04-06-2018, 05:59 PM
Captain,

The sample document you sent privately was not a "basic text" document. It has multiple headers and footers in addition to the basic text rectangle. Accordingly, the text rectangle is index 4 (not 1). Additionally each page is delimited with a page break and a continuous section break (not just a page break).

This code reduced that document from 8 pages down to 4.


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey, http://gregmaxey.com/word_tips.html, 4/6/2018
Dim oPage As Page, oPagePrevious As Page
Dim oRngP As Range, oRngPP As Range
Dim lngIndex As Long
For lngIndex = ActiveWindow.ActivePane.Pages.Count To 2 Step -1
Set oPage = ActiveWindow.ActivePane.Pages(lngIndex)
Set oPagePrevious = ActiveWindow.ActivePane.Pages(lngIndex - 1)
Set oRngP = oPage.Rectangles(4).Range
Set oRngPP = oPagePrevious.Rectangles(4).Range
oRngPP.Select
If Asc(oRngPP.Characters.Last.Previous.Previous) = 12 Then
oRngPP.End = oRngPP.End - 3
Else
oRngPP.End = oRngPP.End - 2
End If
If Asc(oRngP.Characters.Last.Previous.Previous) = 12 Then
oRngP.End = oRngP.End - 3
Else
oRngP.End = oRngP.End - 2
End If
If oRngP.Text = oRngPP.Text Then
oPage.Rectangles(4).Range.Delete
End If
Next
lbl_Exit:
Exit Sub
End Sub

macropod
04-06-2018, 09:12 PM
The sample document you sent privately was not a "basic text" document. It has multiple headers and footers in addition to the basic text rectangle.
Don't you just love it when you try to help someone only to find the scenario described has little in common with the real-world one!!!

CaptainCsaba
04-07-2018, 05:06 AM
Don't you just love it when you try to help someone only to find the scenario described has little in common with the real-world one!!!

I am really sorry. I am not an expert and did not know what does not qualify as "simple text". I am sorry if I have caused confusion regarding the solution.

gmaxey
04-07-2018, 05:22 AM
Captain,

No harm, no foul in my case. While Paul is right, it does help to know the complete scope of the problem when trying to provide a solution, my first propose solution was only applicable to a document containing basic text (or text in the main text storyrange of a document). I explained that (lightly) up front. When you reported back with the error, the only thing that can cause is single character in one of the page rectangles.

Each page in the test document that you sent has four rectangles 1) the page header, 2) the page footer, 2) the shape range (anchored) in the header and 4) the main text. Rectangle 1 (the header) contains only 1 character (the paragraph mark) anchoring the shape. That is why the first code resulted in an error.

Break,

Paul,

Love it? Oh I do, I really do ;-)

CaptainCsaba
04-11-2018, 02:00 AM
Hey!

At the "oPage.Rectangles(4).Range.Delete" part I get an error message saying that I need to be in print layout mode for this to work.

At the "If Asc(oRngPP.Characters.Last.Previous.Previous) = 12 Then" line it enters draft mode and when it gets to the part mentioned above It gets the error. I tried to make it enter print layout mode like this, but no matter where I include it, unfortunatley the macro did not work after it:

If ActiveWindow.View.SplitSpecial = wdPaneNone Then
ActiveWindow.ActivePane.View.Type = wdPrintView
Else
ActiveWindow.View.Type = wdPrintView
End If

What am I missing?

gmaxey
04-11-2018, 10:35 AM
Since the code I provided you worked with the 8 page document you sent via private message then I can only assume that the document you are trying to process now is even more complex and rectangle (4) is not the main text area of your document.

CaptainCsaba
04-11-2018, 11:12 PM
I tried it on the same Document and others and it gives me this error everywhere unfortunately. Could it be something in my settings?

gmaxey
04-12-2018, 11:21 AM
You could try an error handler:


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey, http://gregmaxey.com/word_tips.html, 4/6/2018
Dim oPage As Page, oPagePrevious As Page
Dim oRngP As Range, oRngPP As Range
Dim lngIndex As Long
On Error GoTo Err_View
For lngIndex = ActiveWindow.ActivePane.Pages.Count To 2 Step -1
Set oPage = ActiveWindow.ActivePane.Pages(lngIndex)
Set oPagePrevious = ActiveWindow.ActivePane.Pages(lngIndex - 1)
Set oRngP = oPage.Rectangles(4).Range
Set oRngPP = oPagePrevious.Rectangles(4).Range
oRngPP.Select
If Asc(oRngPP.Characters.Last.Previous.Previous) = 12 Then
oRngPP.End = oRngPP.End - 3
Else
oRngPP.End = oRngPP.End - 2
End If
If Asc(oRngP.Characters.Last.Previous.Previous) = 12 Then
oRngP.End = oRngP.End - 3
Else
oRngP.End = oRngP.End - 2
End If
If oRngP.Text = oRngPP.Text Then
oPage.Rectangles(4).Range.Delete
End If
Next
lbl_Exit:
Exit Sub
Err_View:
ActiveWindow.ActivePane.View.Type = wdPrintView
Resume
End Sub

CaptainCsaba
04-12-2018, 11:10 PM
It's weird. At the New line when it tries to enter printview it gives me this error: "Runtime Error 5918: The property or method is not available on this system."

I searched for this and this seems to be a problem that happened elsewhere also, when the same line of code was written:


https://social.msdn.microsoft.com/Forums/vstudio/en-US/3b5d4722-84f3-4e0d-b60b-d2d553b9ef46/code-works-in-testing-but-throws-error-in-production?forum=vsto

I tried changing the line a bit, but still the same error:

ActiveWindow.ActivePane.View.Type = WdViewType.wdPrintView



I also tried it on another computer where I work and it still gives me this. I am really confused about what this could be.

CaptainCsaba
04-16-2018, 12:46 AM
Hey,

Just so you guys know I also asked this question on another forum, and linked it to here also, I hope that i was within the rules of cross-posting and Idid everything right. You can find the post here:

https://www.mrexcel.com/forum/general-excel-discussion-other-questions/1051874-merging-duplicate-pages-word.html#post5050549

CaptainCsaba
04-18-2018, 01:55 AM
I had a few tries and I managed to work around the error by deleting all the headers and footers and adding them later via a macro code. The macro runs which is awesome and it truly finds the pages it should delete but it only deletes the top paragraph in every page. Do you guys have an idea about why this could be?