PDA

View Full Version : Find and replace text using a data file and "Save As"



Steven Gotz
09-13-2013, 07:33 AM
This is a little embarrassing. I have not needed to code anything in over 20 years. I used to hang out on forums like this answering questions like this about Visual Basic back when Windows 3.1 was the OS. But it has been too long.

Please guide me as best you can. If you could just supply the right words for the concepts in the pseudocode, that would be great. If you could code it, I would be ever so grateful. I would prefer having it work across multiple versions of Word if possible. I am hoping the code is pretty generic and will not have changed much, if at all. I am using 2003, 2007 and 2010 versions of Word, and I imagine the people I will be sharing this with will be as well.

I use Adobe Premiere Pro to edit videos. If I export a title I get a text file (XML) filled with all kinds of stuff, but the text I want to change is just plain text. So it could be <XYZ> and I could find and replace it to achieve my goal. It seems to me that it should look something like this:

I open the Data file which has multiple lines of text - anywhere from 2 to 1000 or more. This is the file where the Macro button resides.

The psuedocode:

File Open Dialog ''''find and open the text file that I will change over and over again
Search for XYZ ''''XYZ being the temporary text I use to help identify the place in the file that needs to be changed
Read first line of data file ''''usually not more than 40 characters but could be up to 255
Replace XYZ with first line
Save As filename plus -n ''''Save text file to new name - n is a number that increments from 0001 to 9999
Read 2nd line of data file
Replace 1st line with 2nd line
Save As filename - incrementing n
Repeat until EOF
EOF post a message saying "Done"

This should leave me with a directory full of individual files numbered sequentially.

In case it helps, the section of the file I am changing looks like this:

<TRString>XYZ</TRString>

The file extension is .prtl

It seems like it should be easy but the actual code is escaping me.

Thank you for taking the time to read this and if you can help, I would be appreciative. If you would be willing to code it and want credit for your work, the entire Adobe video editing community would be happy to shower you with praise. We have the pedestal all ready for you to stand on. :)

-- Steven

Jay Freedman
09-15-2013, 05:08 PM
Welcome to VBA! :hi:

Give this a whirl. Start with an empty document and save it as a Macro-Enabled (*.docm) file or as a Word 97-2003 (*.doc) file. Either will work in any version as long as the Word 2003 users have the Compatibility Pack installed; if they don't have it, or you aren't sure, then use the .doc format. This document will have to be stored in a trusted folder so the macro will be allowed to run.

In the first paragraph of the document, insert a MacroButton field with this code (use Ctrl+F9 to insert the field markers):
{ Macrobutton EditVideoFile Double-click here to start video file processing }

Then paste in the lines of data (the replacements for the XYZ).

Open the macro editor, insert a module in the document's project, and paste in this code:


Option Explicit

Sub EditVideoFile()
Dim dataFile As Document ' this data doc or template
Dim baseFile As Document ' file being modified
Dim baseFileFN As String ' path\name of file from Adobe
Dim rgData As Range ' range of data file containing replacements
Dim para As Paragraph ' paragraph in rgData
Dim rgPattern As Range ' a range used in baseFile
Dim replaceText As String ' current line from this data file
Dim iterNum As Long ' number of current line / saved file
Dim saveFN As String ' path\base name of saved file

Set dataFile = ActiveDocument

MsgBox "Select the data file (*.prtl) to be processed."
With Dialogs(wdDialogFileOpen)
.Name = "*.prtl"
If .Show = -1 Then
Set baseFile = ActiveDocument ' file just opened becomes Active
baseFileFN = baseFile.FullName ' this path\name will be used to open it again
Else
Exit Sub
End If
End With

MsgBox "Select the folder and filename (without numbers) to save as."
With Dialogs(wdDialogFileSaveAs)
If .Display = -1 Then ' just get the path\name, don't actually save
saveFN = WordBasic.FileNameInfo(.Name, 5) & WordBasic.FileNameInfo(.Name, 4) & "-"
'MsgBox saveFN
Else
Exit Sub
End If
End With

On Error GoTo Errhdl
iterNum = 1
Set rgData = dataFile.Range
rgData.MoveStart unit:=wdParagraph, Count:=1 ' exclude paragraph containing macro button
For Each para In rgData.Paragraphs
replaceText = Left(para.Range.Text, Len(para.Range.Text) - 1) ' exclude paragraph mark

' force all-caps XYZ to lower-case xyz
' to avoid inserting text from data file in all caps
Set rgPattern = baseFile.Range
With rgPattern.Find
.Text = "XYZ"
.Wrap = wdFindStop
While .Execute
rgPattern.Text = LCase(rgPattern.Text)
Wend
End With

' do the replacement
baseFile.Range.Find.Execute _
FindText:="xyz", _
MatchCase:=False, _
ReplaceWith:=replaceText, _
Replace:=wdReplaceAll

baseFile.SaveAs _
FileName:=saveFN & Format(iterNum, "0000") & ".prtl", _
FileFormat:=wdFormatText, _
AddToRecentFiles:=False

' close the modified file, then reopen the unmodified base
baseFile.Close SaveChanges:=wdDoNotSaveChanges
Set baseFile = Documents.Open(FileName:=baseFileFN, AddToRecentFiles:=False, Visible:=False)

iterNum = iterNum + 1
Next para

baseFile.Close SaveChanges:=wdDoNotSaveChanges
MsgBox "Done"
Exit Sub

Errhdl:
MsgBox "Error " & Err.Number & " at line " & iterNum & vbCr & Err.Description
End Sub


Open the document and double-click the macrobutton field to start it up.

There are a couple of interesting wrinkles in this code.

The XML file and the output files can be in any folder, and not necessarily in the same folder with each other.
If the XML file contains 'XYZ' in all caps, and the macro tried to replace that with lower-case or mixed-case text from the data file, the result would be all caps. That's a shortcoming of the Range.Find object. To avoid that, this macro first loops through all the occurrences of 'XYZ' and changes them to 'xyz'; then the later replacement preserves the case of the data.
Each time the XML file is opened, modified, and saved to another name, the macro closes it and then reopens the unmodified file to prepare for the next iteration. This opening and closing causes the status bar to flicker. If that bothers you, it can be suppressed, but I see it as a sign that the macro is still running.

Steven Gotz
09-16-2013, 07:27 AM
OK, that is simply amazing. Thank you so much. It would have taken me a month of Sundays to get that code written, and I probably would not have been anywhere near that efficient.

There is a problem with the formatting, however. For some reason it stripped out all of the bracketed information and left just the values of the parameters. It is like Word recognized the XML like structure and just got rid of it. Might it be the way the files are saved? Or the method by which they are opened?

For example, the area around the text to be changed looks like this before the macro is run:

"<txBase>173.01</txBase><XPos>42.7166</XPos><angle>0</angle><verticalText>false</verticalText><objectLeading>0</objectLeading></BaseProperties><EnclosingObjectType>block</EnclosingObjectType><Alignment>left</Alignment><TRString>xyz</TRString>"

After the replacement, the new title, with the text changed to "This is a test" looks like this:
"
173.01
42.7166
0
false
0


block
left
This is a test
"

Perhaps the brackets are confusing Word? Is there a way to tell Word that this is just text, not formatted?

Once again, thank you so much for doing this. I will start taking it apart line by line to understand it all. It will take a while, but will certainly be worth the effort.

Jay Freedman
09-16-2013, 10:24 AM
:doh: I didn't test my code with an actual XML file, but with just a text file that I threw together with some bracketed nodes, because I didn't expect a *.prtl file to have the line
<?xml version="1.0" encoding="utf-8"?>
at the top. When you use Word's Open command to open any file that starts with that line, Word automatically strips out all the XML coding. (This is a result of a patent lawsuit that Microsoft lost, http://www.zdnet.com/blog/microsoft/custom-xml-the-key-to-patent-suit-over-microsoft-word/3712.)

It took me a few false starts to figure out how to get the coding to stay. Using the Range object's InsertFile method didn't do it. I wound up using what are perhaps the oldest functions in VBA, the Open #, Line Input #, and Close # functions, in the subroutine you'll find at the end of this version. The subroutine inserts the text line by line into a new blank document. :p And I've tested this one with a real XML file renamed as a .prtl file...


Option Explicit

Sub EditVideoFile()
Dim dataFile As Document ' this data doc or template
Dim baseFile As Document ' file being modified
Dim baseFileFN As String ' path\name of file from Adobe
Dim rgData As Range ' range of data file containing replacements
Dim para As Paragraph ' paragraph in rgData
Dim rgPattern As Range ' a range used in baseFile
Dim replaceText As String ' current line from this data file
Dim iterNum As Long ' number of current line / saved file
Dim saveFN As String ' path\base name of saved file

Set dataFile = ActiveDocument

MsgBox "Select the data file (*.prtl) to be processed."
With Dialogs(wdDialogFileOpen)
.Name = "*.prtl"
If .Display = -1 Then ' just get the path\name, don't actually open
baseFileFN = WordBasic.FileNameInfo(.Name, 1) ' full path\name
Else
Exit Sub
End If
End With

MsgBox "Select the folder and filename (without numbers) to save as."
With Dialogs(wdDialogFileSaveAs)
If .Display = -1 Then ' just get the path\name, don't actually save
saveFN = WordBasic.FileNameInfo(.Name, 5) & WordBasic.FileNameInfo(.Name, 4) & "-"
'MsgBox saveFN
Else
Exit Sub
End If
End With

'On Error GoTo Errhdl
iterNum = 1
Set rgData = dataFile.Range
rgData.MoveStart unit:=wdParagraph, Count:=1 ' exclude paragraph containing macro button
For Each para In rgData.Paragraphs
replaceText = Left(para.Range.Text, Len(para.Range.Text) - 1) ' exclude paragraph mark

Set baseFile = Documents.Add(Visible:=False)
InsertAsNotXML FN:=baseFileFN, baseFile:=baseFile

' force all-caps XYZ to lower-case xyz
' to avoid inserting text from data file in all caps
Set rgPattern = baseFile.Range
With rgPattern.Find
.Text = "XYZ"
.Wrap = wdFindStop
While .Execute
rgPattern.Text = LCase(rgPattern.Text)
Wend
End With

' do the replacement
baseFile.Range.Find.Execute _
FindText:="xyz", _
MatchCase:=False, _
ReplaceWith:=replaceText, _
Replace:=wdReplaceAll

baseFile.SaveAs _
FileName:=saveFN & Format(iterNum, "0000") & ".prtl", _
FileFormat:=wdFormatText, _
AddToRecentFiles:=False

' close the modified file without saving
baseFile.Close SaveChanges:=wdDoNotSaveChanges

iterNum = iterNum + 1
Next para

MsgBox "Done"
Exit Sub

Errhdl:
MsgBox "Error " & Err.Number & " at line " & iterNum & vbCr & Err.Description
End Sub

Sub InsertAsNotXML(FN As String, baseFile As Document)
Dim aLine As String
Dim pos As Long

Open FN For Input As #1
Do While Not EOF(1)
Line Input #1, aLine

' There may be other characters at the start of the line,
' so remove them.
pos = InStr(LCase(aLine), "<?xml")
If pos > 0 Then
aLine = Right(aLine, Len(aLine) - pos + 1)
End If

baseFile.Range.InsertAfter aLine & vbCr
Loop
Close #1
End Sub

Steven Gotz
09-16-2013, 09:04 PM
Jay,

You have done me a phenomenal favor. Well, me and potentially thousands of other people who will probably want to use this macro.

I knew it was a mistake not posting a copy of the actual file. Sorry. I can't post a link since my post count is too low, but here it is for you to copy/paste. http: // stevengotz . com / files / Title.prtl (had to fool the forum software)

There are only two more small favors I must ask.

The Save As dialog starts with the default file name of "Create Titles.docm" using .docm as the Save As type. How can I get it to use something like "NameOfTitle" and use the Plain Text (*.txt) format?

I thought it was saving as the docm format but I was wrong. I must have done something else wrong. It works fine just the way it is. But I would still like to see the default name changed.

I will set about trying to figure it out myself, but I have forgotten more than I realized. My first guess is that there are parameters to the wdDialogFileSaveAs command that I need to supply. But if you want to chime in one last time, I would not be offended. :)

The second favor is simply this: Please let me know if you would like to have your name as the author of this macro show up in the beginning of the document. I would love to see you get the credit and the thanks of a grateful video editing community, but I don't want to expose you to a bunch of people asking for favors, either.

Steven Gotz
09-16-2013, 10:28 PM
OK. I thought I had it working. Then I didn't. Then I did.

Well, I fooled myself. I replaced "xyz" with "This is a test" and it failed. I tried it again with a simple "123" and it worked. I thought I was going nuts for a while there.

It finally got through my thick skull that there was a problem with the number of characters. It is being stored in a separate parameter named "Run Count". See below. I changed xyz to xyz1 and the run count of 4 crashed the program. It should be 5 as seen below. Apparently the program is counting an extra character of some sort since three characters requires a Run Count of 4.

<RunLengthEncodedCharacterAttributes><CharacterAttributes RunCount="5" StyleRef="4097" TextRef="4097" TXKerning="0" TXPostKerning="0" BaselineShifting="0" /></RunLengthEncodedCharacterAttributes><tagName>

So, now the problem is counting the number of characters. I can figure that one out. Then I have to search for the Run Count parameter. Using a modified version of the ' do the replacement section of the code, I should be able to work this out.

If this sounds OK, great! If I am heading down the wrong path, please stop me.

Thanks again.

I think I will wait until I am fresh. Doing this late at night is just asking for trouble.

Jay Freedman
09-17-2013, 07:48 AM
You just need a couple of minor tweaks.

To get the name and file type correct in the SaveAs dialog box, add these two lines after the With Dialogs(wdDialogFileSaveAs) statement:


.Name = "NameOfTitle.txt"
.Format = wdFormatText


To fix the RunCount, add this section after the existing Find/Replace:


baseFile.Range.Find.Execute _
FindText:="RunCount=" & Chr(34) & "4" & Chr(34), _
MatchCase:=False, _
ReplaceWith:="RunCount=" & Chr(34) & Len(replaceText) + 1 & Chr(34), _
Replace:=wdReplaceAll

That replaces the "4" with one greater than the length of the string used to replace "XYZ".

I did download your Title.prtl file and tested the code with it. As best I can tell without the Adobe software, it works.

Steven Gotz
09-17-2013, 08:55 PM
Jay,

You are a CHAMP!!!

Please do me one more favor. Find someone to pat you on the back for me please.

I help people every day on the Adobe forums, and I remember posting code all those many years ago, on bulletin boards and then Compuserve. But this would have taken more brain power and cost me a lot of hair pulling to get this done. You just saved me so much work, and I am going to be sharing this on LinkedIn as well as on the Adobe Premiere Pro forum.

So, as I said, I will be more than happy to give you credit for a job well done. Just let me know if it is OK to use your name. Otherwise I will just credit it to a great guy on a forum I found. That way you won't get bugged for special favors.

Thank you, thank you, thank you. I knew this was possible, but you made it look so easy. I have to tell you though, I actually understood the second version. The first one was like Greek to me. This one makes a LOT more sense to an old coder.

I will come back and post when I finish my tutorial on how to use this and put it on YouTube. I am just going to get so much use out of this.

Steven Gotz
09-17-2013, 10:45 PM
I just tested it to 9999 titles. It took 50 minutes. Not bad. Not for that much time savings. Not bad at all.

Jay Freedman
09-18-2013, 07:17 AM
Yes, you can put my name on it, with my web address, http://jay-freedman.info.

I don't mind being asked for help, if I can occasionally say no. Like you, I've been doing this since the time when dial-up bulletin boards were the only places available.

Steven Gotz
09-25-2013, 11:51 AM
Jay,

(Or anyone else who might happen to stop by...)

Everything is working great and I have a bunch of happy, English speaking beta testers.

The Spanish speakers? Not so much.

Apparently, the code can't handle the diacritical marks that Spanish and some other languages use. I assume it is because we are using code for ASCII text.

So, obviously, the question is: How do we accommodate non-ASCII text? Is it inherent in the .Text parameter or just how the String is declared?

The text that is causing the trouble looks like this:
Hablaré acerca de los factores
factores podrían
en células
Hace 45 años

Jay Freedman
09-25-2013, 02:50 PM
Steven,

I'm quite sure it isn't anything to do with ASCII vs. non-ASCII, because all of those characters (é, í, ñ, as well as ¿, ö, ü, and a lot of others) are within the ASCII range of character codes below 256. Further, when I pasted your examples of Spanish text into my data document and ran the macro, the correct characters appeared in the output files.

Since Word refuses to open the output files, I've been using a text editor (UltraEdit) to look at them. When I used Notepad the first time, it came up with some odd font that showed arrows and other strange glyphs for the accented characters, but when I switched to Arial or Times New Roman they came out as expected. Maybe Adobe Premiere or whatever program you're using to open the files doesn't like ASCII values above 128?

authorleon
03-08-2015, 05:15 PM
OH MY GOD

THANK YOU THANK YOU THANK YOU... 100 beers for this man!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

May I ask 1 little change PLEASE!!!!!!!!!!!!

Instead of tormenting the file name. Can the file name be of the test.

E.G (in the word file)

Hello world /ænd / -- Please NOTE I WILL BE USING FOR STRANGER CHARACTER - I get this message, The document may contact text content that will be lost upon conversion etc etc
Peter pan /fender/
But face /yay/

The files will be

Hello world - ænd .prtl
Peter pan - fender.prtl
But face - yay.prtl

THANK YOU SOOOO MUCH!!!!!!!!!!!!!!!



You just need a couple of minor tweaks.

To get the name and file type correct in the SaveAs dialog box, add these two lines after the With Dialogs(wdDialogFileSaveAs) statement:


.Name = "NameOfTitle.txt"
.Format = wdFormatText


To fix the RunCount, add this section after the existing Find/Replace:


baseFile.Range.Find.Execute _
FindText:="RunCount=" & Chr(34) & "4" & Chr(34), _
MatchCase:=False, _
ReplaceWith:="RunCount=" & Chr(34) & Len(replaceText) + 1 & Chr(34), _
Replace:=wdReplaceAll

That replaces the "4" with one greater than the length of the string used to replace "XYZ".

I did download your Title.prtl file and tested the code with it. As best I can tell without the Adobe software, it works.

authorleon
03-08-2015, 05:50 PM
I have fixed the special character part by


FileName:=saveFN & Format(iterNum, "0000") & ".prtl", _
FileFormat:=wdFormatText, _
Encoding:=msoEncodingUnicodeLittleEndian, _
AddToRecentFiles:=False