koryonik
10-15-2014, 04:01 AM
Hello,
I need to analyze text of my Word document, and create bookmarks on range of text my analyzer has detected (almost like a grammar checker). I don't want use Find() utility, because my needs are too specific.
INFO 1 : I develop an addin in C# but issue seems the same in VBA (I see similar issue on this forum, specified at the end of this post), I hope this is the right place to post this question.
INFO 2 : I have cross-posted this question in different forums :
- stackoverflow.com/questions/26380163/word-range-move-range-index-in-the-formatted-text-that-corresponds-to-the-plai
- social.msdn.microsoft.com/Forums/vstudio/en-US/70e3d2f3-00b9-4f64-b35d-15632df0d20a/wordrange-make-changes-to-the-formatted-text-that-corresponds-to-the-plain-text?forum=vsto
--------------
For that,
1/ I Retrieve Plain text of the main story of my document :
String plainText = ActiveDocument.Range().Text;
2/ I send it to my analyzer tool which return a collection of marker with position :
For example, if I wanted to detected the pattern "my pattern" in the document text, analyzer could return a marker as { pattern : "my marker", start: 5, end : 14 }, where "start" and "end" are the character indexes of the pattern in the plain text sent.
3/ I create bookmark from theses markers
For previously example, it woold be :
// init a new range and collapse it
Word.Range range = activeDocument.Range();
range.Collapse(WdCollapseStart);
// move character-by-character in the "formatted" text
range.MoveStart(WdUnits.Character, Marker.start ); # Marker.start =5
//set length (end)
range.setRange(range.Start, range.Start+(Marker.End-Marker.Start)); Marker.end = 14
4/ Result
4.1 Global Result
Everything is OK when Document Main Story Contains Text, links, lists, titles :
Ranges are well positionned, Plain Text indexes correlate with formatted text indexes.
4.2 Arrays Issue
When a document contains an array, Ranges are bad positionned a few characters : Plain Text indexes correlate not exactly with formatted text indexes.
I found the reason of this issue (It was explained in others forums) : this is due to non printing char(7), which is a cell delimiter added in plain text. We can handle these chars to calculate position range and everything is OK !
4.3 Issue for Content Controls, Table of contents, Sections and others
When a document contains theses elements, Ranges are also bad positionned a few characters.
Others non printing appears in plain text but I don't understand what it means and how deal with to calculate position range.
By displaying Word element markers with "Developer ribbon > creation mode", we see 2 markers per elements : shifting plain text indexes by 2*elements resolve issues. It's seems OK.
4.4 Issue with Endpaper
I don't know how we says "page de garde" (french) in english, I think it's "endpaper" : this is the first page with specific header, footer and content controls :)
When a document contains an Endpaper, Ranges are also bad positionned a few characters.
But this time, there are not non printing marker in the plain text.
Other info, when I display word element markers with "Developer ribbon > creation mode", I see endpaper markers.
5/ Questions
- How detect Endpaper in Word Document Range ?
- How understand Plain Text indexes don't always correlate with formatted text indexes, in function of Word document elements which contains ?
- XML nodes manipulation would be a more reliable alternative for that? If yes, could you give me good examples to manage bookmars or others in current document with XML Api ?
6/ Others ressources
I found similar issues :
- stackoverflow.com/questions/3772938/correlate-range-text-to-range-start-and-range-end
- vbaexpress.com/forum/showthread.php?36710-Strange-character-on-table-range-text
I hope I was more clear in this message and you can help me to understand or show me a best way to do that ?
Thanks, really.
I need to analyze text of my Word document, and create bookmarks on range of text my analyzer has detected (almost like a grammar checker). I don't want use Find() utility, because my needs are too specific.
INFO 1 : I develop an addin in C# but issue seems the same in VBA (I see similar issue on this forum, specified at the end of this post), I hope this is the right place to post this question.
INFO 2 : I have cross-posted this question in different forums :
- stackoverflow.com/questions/26380163/word-range-move-range-index-in-the-formatted-text-that-corresponds-to-the-plai
- social.msdn.microsoft.com/Forums/vstudio/en-US/70e3d2f3-00b9-4f64-b35d-15632df0d20a/wordrange-make-changes-to-the-formatted-text-that-corresponds-to-the-plain-text?forum=vsto
--------------
For that,
1/ I Retrieve Plain text of the main story of my document :
String plainText = ActiveDocument.Range().Text;
2/ I send it to my analyzer tool which return a collection of marker with position :
For example, if I wanted to detected the pattern "my pattern" in the document text, analyzer could return a marker as { pattern : "my marker", start: 5, end : 14 }, where "start" and "end" are the character indexes of the pattern in the plain text sent.
3/ I create bookmark from theses markers
For previously example, it woold be :
// init a new range and collapse it
Word.Range range = activeDocument.Range();
range.Collapse(WdCollapseStart);
// move character-by-character in the "formatted" text
range.MoveStart(WdUnits.Character, Marker.start ); # Marker.start =5
//set length (end)
range.setRange(range.Start, range.Start+(Marker.End-Marker.Start)); Marker.end = 14
4/ Result
4.1 Global Result
Everything is OK when Document Main Story Contains Text, links, lists, titles :
Ranges are well positionned, Plain Text indexes correlate with formatted text indexes.
4.2 Arrays Issue
When a document contains an array, Ranges are bad positionned a few characters : Plain Text indexes correlate not exactly with formatted text indexes.
I found the reason of this issue (It was explained in others forums) : this is due to non printing char(7), which is a cell delimiter added in plain text. We can handle these chars to calculate position range and everything is OK !
4.3 Issue for Content Controls, Table of contents, Sections and others
When a document contains theses elements, Ranges are also bad positionned a few characters.
Others non printing appears in plain text but I don't understand what it means and how deal with to calculate position range.
By displaying Word element markers with "Developer ribbon > creation mode", we see 2 markers per elements : shifting plain text indexes by 2*elements resolve issues. It's seems OK.
4.4 Issue with Endpaper
I don't know how we says "page de garde" (french) in english, I think it's "endpaper" : this is the first page with specific header, footer and content controls :)
When a document contains an Endpaper, Ranges are also bad positionned a few characters.
But this time, there are not non printing marker in the plain text.
Other info, when I display word element markers with "Developer ribbon > creation mode", I see endpaper markers.
5/ Questions
- How detect Endpaper in Word Document Range ?
- How understand Plain Text indexes don't always correlate with formatted text indexes, in function of Word document elements which contains ?
- XML nodes manipulation would be a more reliable alternative for that? If yes, could you give me good examples to manage bookmars or others in current document with XML Api ?
6/ Others ressources
I found similar issues :
- stackoverflow.com/questions/3772938/correlate-range-text-to-range-start-and-range-end
- vbaexpress.com/forum/showthread.php?36710-Strange-character-on-table-range-text
I hope I was more clear in this message and you can help me to understand or show me a best way to do that ?
Thanks, really.