PDA

View Full Version : Reading DOCX using the Outline into sub-documents (with VBA)



steve8445
01-26-2018, 08:49 PM
(Attached — sample master DOCX)


I am trying to create multiple Word subdocx’s from each section of my master DOCX using Word VBA.


Please see the “outline” view of my attached sample master DOCX.
Example output would be:


SUBDOCX FILENAME ==============> CONTENT (rich text)

1--Heading 1.docx ==============> Shuttlefish

1-1--Heading 2.docx ==============> Modems

1-1-1—Normal.docx ==============> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur ut scelerisque risus. Nunc cursus nisl a purus commodo, et commodo libero efficitur. Donec mattis metus turpis, et vestibulum lacus tempor eu. Aliquam suscipit urna et nibh tristique, accumsan laoreet massa consectetur.
Praesent ut pulvinar nibh. Fusce a felis quis augue hendrerit pellentesque. Aliquam blandit est sit amet consequat auctor. Phasellus blandit iaculis augue vitae lacinia. Donec lectus quam, hendrerit sed malesuada dapibus, aliquet vitae ex.

1-2--Heading 2.docx ==============> Aquatic

(etc…)


The subdocx’s get the name similar to the above (including the name of the style used in the first word of the section). The content contains some rich text e.g. some words may be bolded, italicised etc.


Would appreciate comments as to how to approach this with Word VBA. Thanks.

macropod
01-26-2018, 09:30 PM
If you're thinking of using Word's "master documents" 'feature', don't. See:
https://wordmvp.com/FAQs/General/WhyMasterDocsCorrupt.htm
http://www.addbalance.com/word/masterdocuments.htm

steve8445
01-26-2018, 10:38 PM
If you're thinking of using Word's "master documents" 'feature', don't.

Thanks for the heads-up on this.

I want to loop through the DOCX and pick up each of the OUTLINE levels' RICH CONTENT and its corresponding Word STYLE (that I have assigned previously). Then to construct the subdocx's content and their file names from this info (using VBA which I presume is not the same as the corrupted "Master Doc feature" - not sure though). In effect the VBA would create my subdocx's instead of relying on the manual MS Word's corrupted "Master Doc feature".

macropod
01-26-2018, 11:55 PM
It's still not clear what you want to extract or what you mean by what you call a 'subdocx', what you intend it to contain or how, if at all, the
'subdocx' then relates to the 'master DOCX'.

You can get an idea of what's involved with getting the content associated with a particular heading via the macros at:
http://www.msofficeforums.com/word-vba/37009-how-select-text-between-chapters.html#post120486
and:
https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_archive/macro-to-compute-headings-word-counts/8bba18a7-e931-41d7-9481-1bcff364dd3a
Those macros return just message boxes/word counts, but there's no reason they couldn't be adapted to create new documents instead.

See also the code I posted in: www.vbaexpress.com/forum/showthread.php?61536-Finding-heading-of-chapters-in-docx-file-and-copying-paragraphs-to-new-docx (http://www.vbaexpress.com/forum/showthread.php?61536-Finding-heading-of-chapters-in-docx-file-and-copying-paragraphs-to-new-docx)

steve8445
01-27-2018, 05:11 AM
The code at the links was helpful. I modified it to not use FIND but instead to get an absolute heading#.

The problem I'm having is that I want to further modify it to iterate on COUNT in the code below to get each of the styles' content and to copy it into other DOCX's (to one of the subdocx's which contain text for each style in the outline - see in My Master DOCX attached).

HOWEVER the COUNT increments once per paragraph not once per the one STYLE's CONTENT (as needed).
e.g. if my NORMAL content below 1.1 Modems has 4 paragraphs they should all be obtained once and copied into 1 subdocx (not 4).


Sub Demo()
Application.ScreenUpdating = False
Dim Rng As Range
With ActiveDocument.Range

Selection.GoTo What:=wdGoToHeading, Which:=wdGoToAbsolute, Count:=7

' With .Find
' .ClearFormatting
' .Replacement.ClearFormatting
' .Text = "Curabitur"
' .Style = "Normal"
' .Format = True
' .Forward = True
' .MatchCase = True
' .Wrap = wdFindStop
' .MatchWildcards = False
' .Execute
'End With

'If .Find.Found = True Then
Set Rng = .Duplicate
Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
MsgBox Rng.Text
'End If

End With
Application.ScreenUpdating = True
End Sub

steve8445
01-27-2018, 05:54 AM
To further clarify -- in the attached image (JPEG above) I would need to produce 6 subdocx's. Each subdocx would have the CONTENT for the OUTLINE LEVELS 1-3.

steve8445
01-27-2018, 03:32 PM
See also the code I posted in: LAST WEBLINK IN POST [/FONT]

I am using Mac Word 2011 and this code is freezing my MS Word consistently at 80-90% cpu usage requiring a "Force Quit" of MS Word. Not sure why.

The other versions of the similar code are not compiling properly in Mac Word VBA. It fails early on around the .Find line.

Unfortunate because this code looks very close to what I am looking for.

macropod
01-28-2018, 03:55 AM
Cross-posted at: https://stackoverflow.com/questions/48472234/how-to-read-ms-word-docx-using-its-outline-into-subdocuments-using-vba
Please read VBA Express' policy on Cross-Posting in item 3 of the rules: http://www.vbaexpress.com/forum/faq.php?faq=new_faq_item#faq_new_faq_item3

macropod
01-28-2018, 07:08 PM
Try:

Sub DemoB()
Application.ScreenUpdating = False
Dim DocSrc As Document, DocTgt As Document, Rng As Range, StrPath As String, StrName As String
Set DocSrc = ActiveDocument: StrPath = DocSrc.Path & Application.PathSeparator
With DocSrc.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = ""
.Replacement.Text = ""
.Forward = True
.Format = True
.Style = wdStyleHeading1
.Wrap = wdFindStop
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute
End With
Do While .Find.Found
Set Rng = .Duplicate.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
Set DocTgt = Documents.Add
With DocTgt
.Characters.Last.FormattedText = Rng.FormattedText
StrName = Split(.Paragraphs.First.Range.Text, vbCr)(0) & ".docx"
.SaveAs FileName:=StrPath & StrName, FileFormat:=wdFormatXMLDocument, AddToRecentFiles:=False
.Close SaveChanges:=False
End With
.End = Rng.End
If .End = ActiveDocument.Range.End Then Exit Do
.Collapse wdCollapseEnd
.Find.Execute
Loop
End With
Application.ScreenUpdating = True
End Sub