PDA

View Full Version : Dividing a document in two



rastapopoulo
04-10-2012, 11:36 AM
Hello,

I’m trying to write a VBA script in Word that splits a bilingual document in two documents, and I’m hoping someone can help me out.

I have a two-column bilingual document that is formatted roughly like this :

English - French (Section Break)
(Section Break)
English - French (Section Break)
(Section Break)
English - French (Section Break)
(Section Break)


Please see attached doc for sample.

I’m wondering if it’s possible to extract all the English, dump it in a file, and extract all the French and dump it in another file. The section breaks would be left out of the unilingual documents.

I’d appreciate any help in tackling this challenge. My programming background is in PHP. I suppose the logic is the same, but I’m really outside my element with VBA.

Many thanks,
Simon

fumei
04-10-2012, 05:11 PM
Hmmm, I have been at this for two hours so far, and I am stumped. I was hoping to use the Section index number, but unfortunately both English and French are in the same Section - even though they are separated by a Section (Column) break. I do not know of a way to separate those column breaks into individual ranges...so far.

rastapopoulo
04-10-2012, 05:27 PM
Thanks fumei.

Can you use "^n" to identify the column breaks?

http://www.extendoffice.com/documents/word/640-word-remove-column-break.html

fumei
04-10-2012, 05:46 PM
Yes, sure, but you want to extract contents of a column, not remove the column breaks.

rastapopoulo
04-10-2012, 05:51 PM
I guess it doesn't work as I imagined. I thought you could load all the content in an array, and use the breaks as delimiters to split it. You could then say what's on the left of the column break delimiter goes in file a, what's on the right goes in file b.

And I don't know if it helps, but the only thing I need to keep is the text, not necessarily the images or the style (bold, underline, etc.)

fumei
04-10-2012, 07:04 PM
hmmmm. The problem is determining that THIS range is english, and THAT range is french.

fumei
04-10-2012, 08:49 PM
I thought it may work using odd numbered sections as being the english. It seems that they are...but when extracted by odd numbered sections (ignoring all even numbered ones), the english and french came out as being in the SAME section.

Talis
04-10-2012, 09:33 PM
Just a hint to get you started - :whistle:
If you do a find:

Find: ^m*^13
With Use wildcards checked and check 'Highlight all items found in...' <Main document> it will highlight all the English bits.
Similarly

Find: ^13*^m will highlight all the French bits.

rastapopoulo
04-11-2012, 09:03 AM
Great, I can select the right text with ^m*^13 and ^13*^m. I can then copy and paste those selections in separate Word documents, and then I have to remove all section breaks from those documents. I tried recording a macro, but it doesn’t seem to work. How do I automate it with a macro?

I’m also wondering if it’s possible, before selecting the French and English, to remove all text that doesn’t appear in a two-column format (for example, in the sample attached in my first post, the captions for the tables at the bottom do not appear in columns. This text could be ignored since it won’t be possible to properly extract the French and English in this case).

Thanks.

Talis
04-11-2012, 09:12 AM
So make two copies of the document.
First copy - do a Find for the French and Replace with nothing. This leaves the English. Remember to check 'Use wildcards'.
If you want to retain the section breaks use:


Find: ^13*(^m)
Replace: \1

You may want to now 'Select All' (CTRL-A) then change to single column.
Save as your English version.

Second copy...Etc.

rastapopoulo
04-11-2012, 09:21 AM
In that case, there will be a lot of manual work involved... It's not possible to automate it with a single macro that does all the steps? I will have hundreds of documents to work with so I am trying to reduce file manipulation (that can lead to mistakes if several people are working on them).

Talis
04-11-2012, 09:35 AM
You posted message #9 whilst I was posting message #10 so the thread looks a bit strange now!
In your original post #1 you stated you had a document. Now it's hundreds! :banghead:
Yes a subroutine can be written to automate everything but the coding may depend on the version of Office Word that you are using as I believe some features were removed and others added from v2003 to later versions. So whoever does the coding would need to know the version(s).

rastapopoulo
04-11-2012, 10:47 AM
This is the code that I get when I use the Record Macro tool in Word 2007. When I run the code, however, an error message tells me no selection was made.

I would also like the script to add "-ENG" or "-FRE" to the file name automatically. In this case, it's hardcoded as "sample-ENG/FRE".

Any idea what's wrong with the script?


Sub SplitBilingualDocument()
'
' SplitBilingualDocument Macro
'
'
Selection.Find.ClearFormatting
With Selection.Find
.Text = "^m*^13"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchFuzzy = False
.MatchWildcards = True
End With
Selection.Copy
Documents.Add Template:="Normal", NewTemplate:=False, DocumentType:=0
Selection.PasteAndFormat (wdPasteDefault)
ActiveWindow.ActivePane.VerticalPercentScrolled = 0
ActiveWindow.ActivePane.View.ShowAll = Not ActiveWindow.ActivePane.View. _
ShowAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^m"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchFuzzy = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
ActiveWindow.ActivePane.VerticalPercentScrolled = 0
Application.Move Left:=67, Top:=75
ChangeFileOpenDirectory "C:\My_Documents\TEMP\"
ActiveDocument.SaveAs FileName:="sample-ENG.doc", FileFormat:= _
wdFormatDocument, LockComments:=False, Password:="", AddToRecentFiles:= _
True, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:= _
False, SaveNativePictureFormat:=False, SaveFormsData:=False, _
SaveAsAOCELetter:=False
ActiveWindow.Close
Selection.Find.ClearFormatting
With Selection.Find
.Text = "^13*^m"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchFuzzy = False
.MatchWildcards = True
End With
Selection.Copy
Application.Move Left:=39, Top:=39
Documents.Add Template:="Normal", NewTemplate:=False, DocumentType:=0
Selection.PasteAndFormat (wdPasteDefault)
ActiveWindow.ActivePane.VerticalPercentScrolled = 0
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^m"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchFuzzy = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
ActiveWindow.ActivePane.VerticalPercentScrolled = 0
ActiveDocument.SaveAs FileName:="sample-FRE.doc", FileFormat:= _
wdFormatDocument, LockComments:=False, Password:="", AddToRecentFiles:= _
True, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:= _
False, SaveNativePictureFormat:=False, SaveFormsData:=False, _
SaveAsAOCELetter:=False
ActiveWindow.Close
End Sub

Talis
04-11-2012, 11:23 PM
Try this. You will need to put it in place of your recorded macro.
Open Word showing a blank document. Run the macro 'SplitBilingualDocs'.
Try it out on a small folder of copies, not originals. The new ENG and FRA files will be saved in this same folder.
Graphics and various junk have been removed but the Styling/Formatting is left up to you.
Sub SplitBilingualDocs()
Dim StrFnd As String
Dim intPos As Integer
Dim strFolder As String, strFile As String, wdDoc As Document
Application.ScreenUpdating = False
strFolder = GetFolder
If strFolder = "" Then Exit Sub
ChangeFileOpenDirectory strFolder
strFile = Dir(strFolder & "\*.doc", vbNormal)
While strFile <> ""
Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
AddToRecentFiles:=False, Visible:=False)
intPos = InStrRev(strFile, ".")
strFile = Left(strFile, intPos - 1)
If Right(strFile, 4) = "-ENG" Or Right(strFile, 4) = "-FRE" Then GoTo DoneIt
Call ExtractIt(strFile, wdDoc, "ENG")
Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
AddToRecentFiles:=False, Visible:=False)
Call ExtractIt(strFile, wdDoc, "FRE")
DoneIt:
strFile = Dir()
Wend
Set wdDoc = Nothing
Application.ScreenUpdating = True
End Sub

Sub ExtractIt(strFile, wdDoc, Lang)
Dim Append As String
Dim SearchStr As String
Dim strDocName As String
If Lang = "ENG" Then SearchStr = "^13*^m" Else SearchStr = "^m*^13"
With wdDoc.Range
With .Find
.ClearFormatting
.Text = SearchStr
.Replacement.ClearFormatting
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
With .Find
.ClearFormatting
.Text = "^l"
.Replacement.ClearFormatting
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = False
.Execute Replace:=wdReplaceAll
End With
With .Find
.ClearFormatting
.Text = "^g"
.Replacement.ClearFormatting
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = False
.Execute Replace:=wdReplaceAll
End With
With .Find
.ClearFormatting
.Style = "-figtitle"
.Text = ""
.Replacement.ClearFormatting
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = False
.Execute Replace:=wdReplaceAll
End With
With .Find
.ClearFormatting
.Style = "+figtitle"
.Text = ""
.Replacement.ClearFormatting
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = False
.Execute Replace:=wdReplaceAll
End With
With .Find
.ClearFormatting
.Text = "^13{1,}"
.Replacement.ClearFormatting
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End With
wdDoc.PageSetup.TextColumns.SetCount NumColumns:=1
wdDoc.Paragraphs.TabStops.ClearAll
strDocName = strFile
strDocName = strDocName & "-" & Lang & ".doc"
wdDoc.SaveAs FileName:=strDocName, FileFormat:=wdFormatDocument
wdDoc.Close SaveChanges:=False
End Sub

Function GetFolder() As String
Dim oFolder As Object
GetFolder = ""
Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
Set oFolder = Nothing
End Function
Some parts of these subroutines were developed by senior members of this forum. Special thanks as always to Macropod and fumei.

Talis
04-13-2012, 11:06 PM
This rushed code could be cleaned up and streamlined by using ByVal arguments in the calling subroutine to parameters in the called subroutine. This would avoid the messy overload of extraneous variables in the called subroutine and also remove the need for a re-Set of wdDoc in the main subroutine.

But why bother when there's no response from the original poster.

fumei
04-14-2012, 11:54 AM
Seems a little rushed...it has only been two days.

BoatwrenchV8
04-14-2012, 07:36 PM
could the panes object be used to avoid having to create a copy of the document to extract English from one and French from the other?

fumei
04-15-2012, 12:46 AM
The OP appears to WANT new documents.
I’m trying to write a VBA script in Word that splits a bilingual document in two documents

rastapopoulo
04-26-2012, 07:44 AM
Work took me away from this for a bit but it is still something I'm working on, so thanks for your help. Indeed, I wanted to split the original doc in two new documents. The code from Talis works fairly well but irregularities in the docs (sometimes some pages are not divided in two columns) create errors. I'm finding this out as I go along.

Just a little background:
What we do with the unilingual documents is align their contents with a translation software that allow us to create a translation memory. In this case, irregularities reduce the translation software's match value to a point where it's not as helpful as hoped. The translation software doesn't work with bilingual docs - it needs two unilingual docs.

However, another option for me would be to create a TXT file if I can properly identify in the Word document which En segment/paragraph corresponds to which Fr segment/paragraph.

I noticed a pattern: English/French column sets are separated by section breaks with the styles "straddle1" or "straddle2".

In PHP, I would do the following:
Load the document in a single string, split it into an array based on the delimiters "straddle1" and "straddle2". I would then parse the array, splitting each element in two, the + style (e.g. +indent01) being the English text, and the - style (e.g. -indent01) being the French text. I could then, instead of aligning two bilingual documents as I was planning to do, reconstruct a TXT file that could be used to import the EN and FR segments into our translation memory.

I'm clueless as to how to do that in VBA, and my sense is that I'm asking for a lot without enough details. I'm sure a solution is possible, but perhaps my approach is wrong.

Thanks for all your help.