PDA

View Full Version : Copy / Paste Without Extra Paragraphs from PDF



bstephens
05-20-2013, 10:55 AM
Part of my workflow frequently involves copy / pasting text from .pdf files (and occasionally webpages), and one thing that has always been an annoyance is that when you copy and paste from such documents, there are always extra paragraph symbols in the text that is copy and pasted.

Does anyone have a macro that will copy and paste from pdf but WITHOUT the extra paragraph symbols?

I am fine copy and pasting one paragraph from the pdf, as I don't think its possible to do multiple paragraphs (there arent two paragraph symbols after a real paragraph break so I'm not sure how you would even start to program it).

macropod
05-20-2013, 03:06 PM
Whether you can copy/paste without a break at the end of every line depends on the PDF, not Word. For a macro to clean up text pasted from emails, web pages, PDFs, etc, see: http://www.msofficeforums.com/word/9775-remove-bulk-reset-line-breaks.html

bstephens
05-20-2013, 03:16 PM
Hi Macropod,

Thanks for the macro 'CleanUpPastedText', it handles a bunch of cases I didn't even think of, i.e., hyphenated text, etc.

Is there a way that I can get the routine from CleanUpPastedText to run on text copied into the clipboard BEFORE I paste it into the document?

I was trying to combine everything into one step.

PSUEDOCODE EXAMPLE FOR PROPOSED MACRO "PasteCleanUpText": User is in adobe pdf, and hits "Ctrl-C" to copy content that has the extra paragraph symbols and other 'unwanted' formatting. User switches back to word. User runs PasteCleanedUpText, macro loads the text copied from the pdf into memory, word runs the routine from CleanUpPastedText on the text in memory, PasteCleanedUpText puts the "cleaned up" text in the document. It all happens in one step.

I have tried to develop PasteCleanUpText but I think I am having trouble with finding the appropriate range object.

Best,
BTS

bstephens
05-20-2013, 04:18 PM
The below macro sort of covers it:

I used the paste method as opposed to the pastespecial "wdpastetext" method, and then removed the formatting later.

A few questions:

1. Would using pastespecial be a better way to do it?
2. In the section commented as "Delete hypens in hyphenated text formerly split across lines" I am trying to get it to replace the hyphen (which runs across lines) with a space, but I am not successful at it, it replaces it with nothing so that the words (which formerly had a hyphen which runs across lines) are running together. Anyone know whats wrong with the expression or getting a different result?

Best,
BTS


Sub PasteCleanUpText()

Application.ScreenUpdating = False

Dim oRng As Range
Set oRng = Selection.Range
oRng.Paste
'oRng = Replace(oRng, Chr(13), " ")

oRng.ParagraphFormat.Reset
oRng.Font.Reset

With oRng.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True

'Replace single paragraph breaks with a space
.Text = "([!^13])([^13])([!^13])"
.Replacement.Text = "\1 \3"
.Execute Replace:=wdReplaceAll

'Replace all double spaces with single spaces
.Text = "[ ]{2,}"
.Replacement.Text = " "
.Execute Replace:=wdReplaceAll

'Delete hypens in hyphenated text formerly split across lines
'.Text = "([a-z])-[ ]{1,}([a-z])"
'.Replacement.Text = "\1\2" 'formerly \1\2
'.Execute Replace:=wdReplaceAll

'Limit paragraph breaks to one per 'real' paragraph.
.Text = "[^13]{1,}"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll

End With

Application.ScreenUpdating = True

End Sub

macropod
05-20-2013, 06:59 PM
The whole idea of the macro I posted is to allow the pasted text to retain its original formatting (e.g. headings, italics, bold, colours, etc.). If you try to manipulate the clipboard contents, all the formatting will be lost. you can, of course, include the same logic in a 'paste' macro as you have done, so that the processing is done automatically when the paste controlled by the macro is done.

To change the hyphens to spaces, change:
.Replacement.Text = \1\2
to:
.Replacement.Text = \1 \2