PDA

View Full Version : Transforming paragraphs in Word with a macro



Pwyll2
11-27-2013, 07:12 PM
Hello

I have documents to transform through a macro in Word; I don't master VB at all...
There are many paragraphs in my Word documents, and these paragraphs are separated by one or several blank lines (paragraph jumps).

Here is a sample of one paragraph like those I have to transform:


\xv TEXT OF THE XV MARKER
\sfx audios/NAMEOFTHESOUNDFILE.mp3
\xfon TEXT OF THE XFON MARKER
\xinfor TEXT OF THE XINFOR MARKER
\xbrloc TEXT OF THE BRLOC MARKER
\xtrad TEXT OF THE XTRAD MARKER
\lt TEXT OF THE LT MARKER
\nt TEXT OF THE NT MARKER

Sometimes, the \sfx, \xfon, \xinfor, \lt and \nt can be empty (not followed by text) or even absent ; the xv, xbrloc and xtrad, though, are always there and are never empty.
For instance, sometimes there are paragraphs like this:


\xv blablablah
\xbrloc blablablah
\xtrad blablablah


And all paragraphs should become like this (the markers become XML ones, certain lines are added, and certain markers are not in the same order):
(see that the contents of the fields are the same except for \sfx, - which becomes FichierSon -, that loses the "audios/" part)





<item id="">
<FichierSon>NAMEOFTHESOUNDFILE.mp3</FichierSon>
<image></image>
<TitreImage></TitreImage>
<lieu></lieu>
<informateur>TEXT OF THE XINFOR MARKER</informateur>
<enqueteur></enqueteur>
<decoupeur></decoupeur>
<transcripteur></transcripteur>
<relecteur></relecteur>
<machine></machine>
<FichierSource></FichierSource>
<duree></duree>
<DateEnreg></DateEnreg>
<questionnaire></questionnaire>
<QuestionPosee></QuestionPosee>
<contexte></contexte>
<BretonLocal>TEXT OF THE XBRLOC MARKER</BretonLocal>
<phon>TEXT OF THE XFON MARKER</phon>
<BretonStandard>TEXT OF THE XV MARKER</BretonStandard>
<TraducFr>TEXT OF THE XTRAD MARKER</TraducFr>
<TraducLitt>TEXT OF THE LT MARKER</TraducLitt>
<DonneesMorpho></DonneesMorpho>
<DonneesSynt></DonneesSynt>
<nt>TEXT OF THE NT MARKER</nt>
<type></type>
<theme></theme>
<MotsClesFr></MotsClesFr>
<MotsClesLocal></MotsClesLocal>
<MotsClesStand></MotsClesStand>
<RefAutreExtr></RefAutreExtr>
<commentairePerso></commentairePerso>
<DateCreation></DateCreation>
<DateMiseAJour></DateMiseAJour>
</item>


So the macro will have:
- to replace the markers (\xv BLABLA to <BretonStandard>BLABLA</BretonStandard> etc)
- to remove the "audios/" part of the "FichierSon" field
- to change the order of the fields
- to add the markers that are absent in the first document.

I hope it is doable!
Thanks a million in advance!

fumei
11-28-2013, 11:35 PM
It may be doable. Please post a document (without any sensitive information) with a sample of what is the starting text, and (to make it easy to see what will be the result) another document with how the starting document will end up.

Some points.

Markers. This is meaningless to Word. There are no markers unless you mark them. As it is there is only text.

<commentairePerso></commentairePerso>

is to Word no different, in any way, to

s;fjslfshf sfhsfsflkahdkh

They are both only text. Word sees nothing different, I mean other than different text characters.

Fields. As it seems (but we need to see an actual document to know for sure), there are no fields at all.

To make any change in order, you need to supply the full logic need to determine what to do.

To add any "markers", "absent" has to be perfectly defined.

Pwyll2
11-29-2013, 06:53 AM
Thanks for your answer!
I've uploaded the "document after transformation" and I'll upload the "document before transformation" in my next message (sorry for the order but I thought I could upload 2 documents at the same time and actually one cannot and I couldn't find how to delete the "after transformation" document...).


Markers. This is meaningless to Word. There are no markers unless you mark them. As it is there is only text.

right, what I call "marker" is what looks like "\xv", "\xbrloc" etc in the document before transformation


They are both only text. Word sees nothing different, I mean other than different text characters.

ok. Actually these are called markers in the software I use for linguistic work, but I need to use Word's macros to transform my documents.


Fields. As it seems (but we need to see an actual document to know for sure), there are no fields at all.

what I call "fields" is the variable text that comes after the markers (\xv etc), and so, that will be between the XML tags in the document after transformation.


To add any "markers", "absent" has to be perfectly defined.

I mean, sometimes I will have paragraphs like this:

\xv
\sfx
\xfon
\xinfor
\xbrloc
\lt
\nt

and others where \xfon and/or \xinfor and/or \lt and/or \nt will be missing, so I guess in the Macro we'll need to write something like "transform \xfon and its contents into <phon></phon> and its contents IF IT'S PRESENT, etc caetera.

If something isn't clear yet, please ask me :)

Pwyll2
11-29-2013, 06:56 AM
Here is the document before transformation.
Thanks again :)

fumei
11-29-2013, 07:59 PM
Here are the steps needed to change your before to your after:

ADD a NEW paragraph <item id="">
DELETE paragraph starting \xv (store value for insertion later)
CHANGE paragraph starting \sfx (Delete audios/, change to <FichierSon>)
ADD a NEW paragraph <image></image>
ADD a NEW paragraph <TitreImage></TitreImage>
ADD a NEW paragraph <lieu></lieu>
ADD a NEW paragraph <informateur></informateur>
ADD a NEW paragraph <enqueteur></enqueteur>
ADD a NEW paragraph <decoupeur></decoupeur>
ADD a NEW paragraph <transcripteur></transcripteur>
ADD a NEW paragraph <relecteur></relecteur>
ADD a NEW paragraph <machine></machine>
ADD a NEW paragraph <FichierSource></FichierSource>
ADD a NEW paragraph <duree></duree>
ADD a NEW paragraph <DateEnreg></DateEnreg>
ADD a NEW paragraph <questionnaire></questionnaire>
ADD a NEW paragraph <QuestionPosee></QuestionPosee>
ADD a NEW paragraph <contexte></contexte>
SEARCH for paragraph starting \xbrloc, COPY contents, CHANGE \xbrloc to <BretonLocal>
CHANGE paragraph starting \xfon to <phon>
INSERT paragraph value from previous \xv, CHANGE \xv to <BretonStandard>
DELETE paragraph starting \xinfor
ADD a NEW paragraph <TraducFr>Non. Trop d-, c’était trop dangereux.</TraducFr>
Where does this value come from
DELETE paragraph starting \xtrad
CHANGE paragraph start \lt to </TraducLitt>
ADD a NEW paragraph <DonneesMorpho></DonneesMorpho>
ADD a NEW paragraph <DonneesSynt></DonneesSynt>
CHANGE paragraph starting \nt to <nt>
ADD a NEW paragraph <type></type>
ADD a NEW paragraph <theme></theme>
ADD a NEW paragraph <MotsClesFr></MotsClesFr>
ADD a NEW paragraph <MotsClesLocal></MotsClesLocal>
ADD a NEW paragraph <MotsClesStand></MotsClesStand>
ADD a NEW paragraph <RefAutreExtr></RefAutreExtr>
ADD a NEW paragraph <commentairePerso></commentairePerso>
ADD a NEW paragraph <DateCreation></DateCreation>
ADD a NEW paragraph <DateMiseAJour></DateMiseAJour>
ADD a NEW paragraph </item>

NOTE!!! The above does NOT include the steps needed to add the closing tags to the changes (e.g. .</phon>)

Technically, yes I suppose it is possible, but as you can see the huge amount of processing will require a huge amount of development. Perhaps someone else would take it on, but I do not have the time for such a project.

I do not know where your original text comes from, but as a possible suggestion, why not have - IF the structure is in fact identical for each block - the end result document come from a standard template. In other words do not CHANGE \xfon to <phon>, you start with <phon> in the first place. You do not add (or move) ANY paragraphs, they would all be there to start with.

All I can say is that with the requirements as stated it would take a heck of a lot of work.

Pwyll2
11-29-2013, 09:32 PM
I see, thanks anyway!
I hope someone will have time and courage to write that macro... I'm not able at all myself!

fumei
11-30-2013, 12:22 AM
Well I am going to keep it on my back burner, for interest sake. I do have a question that I posted above.

ADD a NEW paragraph <TraducFr>Non. Trop d-, c’était trop dangereux.</TraducFr>
Where does this value come from

1. I do not know where your original data comes from, and in what form. The Before does not have <TraducFr>, but your After does. OK. Putting a new <TraducFr> in is easy, but WHERE does the data (Non. Trop d-, c’était trop dangereux.) come from, if it is not in the original?

2. Please confirm that the After structure is ALWAYS (ignoring content, but in the following order):
<item id="">
<FichierSon> </FichierSon>
<image></image>
<TitreImage></TitreImage>
<lieu></lieu>
<informateur></informateur>
<enqueteur></enqueteur>
<decoupeur></decoupeur>
<transcripteur></transcripteur>
<relecteur></relecteur>
<machine></machine>
<FichierSource></FichierSource>
<duree></duree>
<DateEnreg></DateEnreg>
<questionnaire></questionnaire>
<QuestionPosee></QuestionPosee>
<contexte></contexte>
<BretonLocal></BretonLocal>
<phon></phon>
<BretonStandard></BretonStandard>
<TraducFr></TraducFr>
<TraducLitt></TraducLitt>
<DonneesMorpho></DonneesMorpho>
<DonneesSynt></DonneesSynt>
<nt></nt>
<type></type>
<theme></theme>
<MotsClesFr></MotsClesFr>
<MotsClesLocal></MotsClesLocal>
<MotsClesStand></MotsClesStand>
<RefAutreExtr></RefAutreExtr>
<commentairePerso></commentairePerso>
<DateCreation></DateCreation>
<DateMiseAJour></DateMiseAJour>
</item>

Yes? No?

And, that paragraphs with:
\xinfor
\xtrad

are ALWAYS deleted. Yes? No? If they are present, can they EVER have content?

I can state categorically no one will be able to help with any code unless they have the logic and rules for processing. These must be precise, clear and complete. There can be NO exceptions.

3. Will the Before text ALWAYS be in discrete chunks?

4. Are you looking for processing over an entire document, made of those (of whatever number) chunks? Or are you thinking about code that processes only a selected chunk, leaving other chunks alone?

Again, if you want someone to assist in coming up with a practical solution you need to come up with very precise requirements.

fumei
11-30-2013, 12:33 AM
As I stated, I will continue to play with this (it IS an interesting challenge), but it will take a lot of work (read time). There are others here who may also take it on, and have a better idea of how it could be done. I definitely think taking the original (Before) data and dumping it into a template is better than trying to do any transforming of the original paragraphs. That is the route I will be trying.

macropod
11-30-2013, 01:11 AM
So far, you've indicated 32 output tags, but only 8 input prefixes. For every tag, what is its corresponding prefix (if any). Please provide this in the order in which the output tags are required.

Pwyll2
11-30-2013, 06:56 AM
Non. Trop d-, c’était trop dangereux.</TraducFr> Where does this value come from



in the Before document, it is this way:

\xv
\sfx
\xfon
\xinfor
\xbrloc
\xtrad
\lt
\nt


and what is after \xtrad should come between the <TraducFr> tags.



2. Please confirm that the After structure is ALWAYS (ignoring content, but in the following order):


Yes? No?

Yes!


And, that paragraphs with:
\xinfor
\xtrad

are ALWAYS deleted. Yes? No? If they are present, can they EVER have content?

they aren't deleted, the content of \xinfor should arrive between the <informateur> tags, and that of \xtrad should arrive between the <TraducFr> tags.
And yes they can have content.


I can state categorically no one will be able to help with any code unless they have the logic and rules for processing. These must be precise, clear and complete. There can be NO exceptions.

no problem, ask me any question until everything is clear for you!


3. Will the Before text ALWAYS be in discrete chunks?


the Before text will always start with a \xv marker... I don't know exactly what a "discrete chunk" is (sorry...)
The Before text always has these markers/fields with content (they are mandatory):

\xv
\xbrloc
\xtrad

and now, it can also have (but not always) these markers:


\sfx
\xfon
\xinfor
\lt
\nt

so that the longest paragraph structure is:

\xv
\sfx
\xfon
\xinfor
\xbrloc
\xtrad
\lt
\nt

And there is always one or more paragraph jumps at the end of each one, before the next \xv




4. Are you looking for processing over an entire document, made of those (of whatever number) chunks? Or are you thinking about code that processes only a selected chunk, leaving other chunks alone?

it should process over an entire document without leaving things alone.






As I stated, I will continue to play with this (it IS an interesting challenge), but it will take a lot of work (read time). There are others here who may also take it on, and have a better idea of how it could be done. I definitely think taking the original (Before) data and dumping it into a template is better than trying to do any transforming of the original paragraphs. That is the route I will be trying.









thanks









So far, you've indicated 32 output tags, but only 8 input prefixes. For every tag, what is its corresponding prefix (if any). Please provide this in the order in which the output tags are required.









\sfx = <FichierSon>
\xinfor = <informateur>
\xbrloc = <BretonLocal>
\xfon = <phon>
\xv = <BretonStandard>
\xtrad = <TraducFr>
\lt = <TraducLitt>
\nt = <nt>

Pwyll2
11-30-2013, 07:05 AM
And if you also need it, here is another correspondence table:


NOTHING -----------> <item id="">
\sfx -------------------> <FichierSon> </FichierSon>
NOTHING --------------> <image></image>
NOTHING --------------> <TitreImage></TitreImage>
NOTHING -------------> <lieu></lieu>
\xinfor ---------------> <informateur></informateur>
NOTHING --------------> <enqueteur></enqueteur>
NOTHING --------------> <decoupeur></decoupeur>
NOTHING ---------------> <transcripteur></transcripteur>
NOTHING --------------> <relecteur></relecteur>
NOTHING ---------------> <machine></machine>
NOTHING --------------> <FichierSource></FichierSource>
NOTHING -------------> <duree></duree>
NOTHING ---------------> <DateEnreg></DateEnreg>
NOTHING ----------------> <questionnaire></questionnaire>
NOTHING ------------> <QuestionPosee></QuestionPosee>
NOTHING ---------------> <contexte></contexte>
\xbrloc -----------------> <BretonLocal></BretonLocal>
\xfon -------------------> <phon></phon>
\xv ----------------------> <BretonStandard></BretonStandard>
\xtrad --------------------> <TraducFr></TraducFr>
\lt -------------------------> <TraducLitt></TraducLitt>
NOTHING --------------------> <DonneesMorpho></DonneesMorpho>
NOTHING -------------------> <DonneesSynt></DonneesSynt>
\nt ---------------------------> <nt></nt>
NOTHING -------------------> <type></type>
NOTHING -------------------> <theme></theme>
NOTHING -------------------> <MotsClesFr></MotsClesFr>
NOTHING -------------------> <MotsClesLocal></MotsClesLocal>
NOTHING -------------------> <MotsClesStand></MotsClesStand>
NOTHING -------------------> <RefAutreExtr></RefAutreExtr>
NOTHING -------------------> <commentairePerso></commentairePerso>
NOTHING -------------------> <DateCreation></DateCreation>
NOTHING -------------------> <DateMiseAJour></DateMiseAJour>
NOTHING -------------------> </item>

macropod
11-30-2013, 03:50 PM
Try:

Sub Demo()
Application.ScreenUpdating = False
Dim Rng As Range, StrTags As String, StrOut As String, i As Long
Dim Str1 As String, Str2 As String, Str3 As String, Str4 As String
Dim Str5 As String, Str6 As String, Str7 As String, Str8 As String, Str9 As String
Dim Str1A As String, Str2A As String, Str3A As String, Str4A As String
Dim Str5A As String, Str6A As String, Str7A As String, Str8A As String
StrTags = ",FichierSon,image,TitreImage,informateur,enqueteur,decoupeur," & _
"transcripteur,relecteur,machine,FichierSource,duree,DateEnreg,questionnaire" & _
",QuestionPosee,contexte,BretonLocal,phon,BretonStandard,TraducFr,TraducLitt ," & _
"DonneesMorpho,DonneesSynt,nt,type,theme,MotsClesFr,MotsClesLocal," & _
"MotsClesStand,RefAutreExtr,commentairePerso,DateCreation,DateMiseAJour,/item"
Str1 = "<item id=" & Chr(34) & Chr(34) & ">" & vbCr & "<" & Split(StrTags, ",")(1) & ">"
For i = 1 To 3
Str2 = Str2 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 4 To 15
Str3 = Str3 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 16 To 16
Str4 = Str4 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 17 To 17
Str5 = Str5 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 18 To 18
Str6 = Str6 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 19 To 19
Str7 = Str7 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 20 To 22
Str8 = Str8 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 23 To 32
Str9 = Str9 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = True
.Text = "audios/"
.Replacement.Text = ""
.Execute Replace:=wdReplaceAll
.Wrap = wdFindStop
.Text = "\\xv (file://\\xv)[!^13]@^13"
.Replacement.Text = ""
.Execute
End With
Do While .Find.Found
Do While Split(.Paragraphs.Last.Next.Range.Text, " ")(0) <> "\xv"
.MoveEnd wdParagraph, 1
If .End = ActiveDocument.Range.End Then Exit Do
Loop
For i = 1 To .Paragraphs.Count
Select Case Split(.Paragraphs(i).Range.Text, " ")(0)
Case "\sfx"
Str1A = Replace(Replace(.Paragraphs(i).Range.Text, "\sfx ", ""), vbCr, "")
Case "\xinfor"
Str2A = Replace(Replace(.Paragraphs(i).Range.Text, "\xinfor ", ""), vbCr, "")
Case "\xbrloc"
Str3A = Replace(Replace(.Paragraphs(i).Range.Text, "\xbrloc ", ""), vbCr, "")
Case "\xfon"
Str4A = Replace(Replace(.Paragraphs(i).Range.Text, "\xfon ", ""), vbCr, "")
Case "\xv"
Str5A = Replace(Replace(.Paragraphs(i).Range.Text, "\xv ", ""), vbCr, "")
Case "\lt"
Str6A = Replace(Replace(.Paragraphs(i).Range.Text, "\lt ", ""), vbCr, "")
Case "\xtrad"
Str7A = Replace(Replace(.Paragraphs(i).Range.Text, "\xtrad ", ""), vbCr, "")
Case "\nt"
Str8A = Replace(Replace(.Paragraphs(i).Range.Text, "\nt ", ""), vbCr, "")
Case Else
End Select
Next
.Text = Str1 & Str1A & Str2 & Str2A & Str3 & Str3A & Str4 & Str4A & Str5 & Str5A & _
Str6 & Str6A & Str7 & Str7A & Str8 & Str8A & Str9 & vbCr & vbCr & vbCr
.Collapse wdCollapseEnd
.Find.Execute
Loop
While .Characters.Last.Previous.Text = vbCr
.Characters.Last.Previous.Text = vbNullString
Wend
End With
Application.ScreenUpdating = True
End Sub

fumei
11-30-2013, 04:06 PM
I have not tested things, but I thought Paul would be able to come up with something.

macropod
11-30-2013, 07:29 PM
I see the board software has once again munged some of the code I posted. Change:

.Text = "\\xv (file://\\xv)[!^13]@^13"
to:

.Text = "\^92xv[!^13]@^13"

FWIW, this:

For i = 16 To 16
Str4 = Str4 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 17 To 17
Str5 = Str5 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 18 To 18
Str6 = Str6 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
For i = 19 To 19
Str7 = Str7 & "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
Next
could be reduced to:

i = 16: Str4 = "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
i = 17: Str5 = "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
i = 18: Str6 = "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
i = 19: Str7 = "</" & Split(StrTags, ",")(i) & ">" & vbCr & "<" & Split(StrTags, ",")(i + 1) & ">"
but I left it as loops in case we need to start changing string order/content...

Pwyll2
11-30-2013, 07:35 PM
Thanks guys
However, when I run it, Word says "Syntax Error" and highlights this line:

.Text = "\\xv (file://\\xv)[!^13]@^13"

I don't know what is wrong... Ask me if there's something I didn't explain properly

thanks

macropod
11-30-2013, 07:37 PM
See my comments in post #14 (http://www.vbaexpress.com/forum/showthread.php?48283-Transforming-paragraphs-in-Word-with-a-macro&p=301051&viewfull=1#post301051)

Pwyll2
11-30-2013, 07:45 PM
Cool, only one problem but it should be easy to solve for you: what is in \xtrad goes into <TraducLitt> and what is in \lt goes into <TraducFr>, while it should be the contrary... :)
thanks

fumei
11-30-2013, 08:04 PM
Yes, what is with the board software "munging".

Pwyll2
11-30-2013, 08:10 PM
I have replaced that line in the macro but it still reverses the data... :dunno
thanks

macropod
12-01-2013, 04:19 PM
I have replaced that line in the macro but it still reverses the data... :dunno

The change I suggested in post #14/#16 was just to fix the syntax error. Obviously, I hadn't read you post #17 at that time...

Change:

Case "\lt"
Str6A = Replace(Replace(.Paragraphs(i).Range.Text, "\lt ", ""), vbCr, "")
Case "\xtrad"
Str7A = Replace(Replace(.Paragraphs(i).Range.Text, "\xtrad ", ""), vbCr, "")
to:

Case "\xtrad"
Str6A = Replace(Replace(.Paragraphs(i).Range.Text, "\xtrad ", ""), vbCr, "")
Case "\lt"
Str7A = Replace(Replace(.Paragraphs(i).Range.Text, "\lt ", ""), vbCr, "")

Pwyll2
12-01-2013, 06:29 PM
YESSSS!
looks like it works perfectly! you're magicians, guys :)

THANKS A MILLION :friends:

fumei
12-01-2013, 06:51 PM
It was not guys, it was Paul (macropod).