PDA

View Full Version : Save each sentence as a new file



eike33
09-16-2010, 11:07 AM
Hi,

I have little experience with VBA, and need some help! I have thousands of .txt files that are formatted like the attached sample document. The forum did not allow me to upload a .txt file, so I copied the text into a Word document for now - just to show what the text itself looks like. But my actual files are saved in .txt format.

I need to save each sentence of the speech as an individual file with a new name. It would be ideal if a delimiter was used that included both periods, question marks, and exclamation points. The name of this file is currently Speech_1_1961. What I'd like to do is save the first sentence as a new .txt file named Speech_1_1961_001, the second sentence as a new .txt file named Speech_1_1961_002, and so on.

Thanks so much in advance!

fumei
09-16-2010, 12:20 PM
Just to be clear:

Inaugural Speech, excerpt - saved as Speech_1_1961_001.txt

by John F. - saved as Speech_1_1961_002.txt (it DOES end with a period!)

Kennedy - saved as Speech_1_1961_003.txt


We observe today not a victory of party, but a celebration of freedom -- symbolizing an end, as well as a beginning -- signifying renewal, as well as change. - saved as Speech_1_1961_004.txt


For I have sworn before you and Almighty God the same solemn oath our forebears prescribed nearly a century and three-quarters ago. - saved as Speech_1_1961_005.txt

Correct?


"It would be ideal if a delimiter was used that included both periods, question marks, and exclamation points."

What do you mean by that?

fumei
09-16-2010, 12:42 PM
Assuming you either:

a) remove that initial text, OR

b) Select the text you actual want;

this is quite easy. The code below uses b). The text starting from "We observe today..." to end is Selected.
Sub MySentences()
Dim oSentence As Object
Dim i As Long
i = 1
For Each oSentence In Selection.Range.Sentences
Documents.Add
With ActiveDocument
.Range = oSentence
.SaveAs FileName:="c:\zzz\Kennedy\Speech_1_1961_00" & i & ".txt", _
FileFormat:=wdFormatText, Encoding:=1252, _
InsertLineBreaks:=False, LineEnding:=wdCRLF
.Close
End With
i = i + 1
Next
End Sub
For each sentence, make a new document and make its content (range) equal to the current sentence.

Save it as "c:\zzz\Kennedy\Speech_1_1961_00" plus a counter (i) plus ".txt

Save it as text format.

Adjust for the path you want to save the files in. Notice that there is NO copying and pasting.

eike33
09-16-2010, 08:13 PM
Hi again,

Thanks for the response. What do you mean by "select the text I want"? You're right in that I want to ignore the title and author, and I just want to start the saving where the actual text begins. Is this what b.) does?

Also, how can I do this for a large set of files within a folder on my computer? Is there a way to run this script on all of the files within a folder?

Thank you so much!

eike33
09-17-2010, 09:07 AM
Hi again,

I have tested this macro on the original file - by opening the file in Word, manually selecting just the text, and running the macro - and it works great!

However, since I have thousands of these files to do, I was hoping for a more automated process. As in, I'd like to put all 6000 .txt files in one folder, and be able to run the macro to convert all of them in this manner.

Thanks in advance!

- Hetal

fumei
09-20-2010, 09:27 AM
So tell me...does every single one of those file have the first two and ONLY two paragraphs to be ignored.

If every single one is identical in this way, sure, it can be automated. essentially, you would be doing the exact same actions for each file. You would use a Dir function. Like this (assuming the actioning procedure is still Sub MySentences):
Sub AllFiles()
Dim file
Dim path As String
path = the path to your folder to be processed
' INCLUDE the "\" at the end!!!!

file = Dir(path & "*.doc") ' assuming doc files
' this allows processing for ALL .doc files
' in the path folder

Do While file <> ""
Call MySentence(ActiveDocument)
file = Dir()
Loop
End Sub The Call instructions sends the the current ActiveDocument (each one of the .docs in the folder) to the Sub MySentence.

MySentence must be adjusted to take a document parameter AND ignore the first two paragraphs.
Sub MySentences()
Dim r As Range
Dim oSentence As Object
Dim i As Long
i = 1
Set r = ActibveDocument.Range( _
Start:=ActiveDocument.Paragraphs(3).Range.Start, _
End:=ActiveDocument.Range.End)
' this sets a range from the third paragraphs to the end of doc.

For Each oSentence In r.Sentences
Documents.Add
With ActiveDocument
.Range = oSentence
.SaveAs FileName:="c:\zzz\Kennedy\Speech_1_1961_00" & i & ".txt", _
FileFormat:=wdFormatText, Encoding:=1252, _
InsertLineBreaks:=False, LineEnding:=wdCRLF
.Close
End With
i = i + 1
Next
End Sub


Again the above will ONLY work if the files are identical in that you are ONLY processing from the third sentence onwards.

eike33
09-20-2010, 09:19 PM
Hi again,

Thank you for this. When I run this, it gets stuck on "Call MySentences (Active Document) and the error reads:

"Compile error

Wrong number of arguments or invalid property assignment."

Can you help with this error message?

fumei
09-21-2010, 10:29 AM
Ooops. my bad. You need to have the receiving Sub have a parameter to take it.

Sub MySentences(oDoc As Document)
Dim r As Range
Dim oSentence As Object
Dim i As Long
i = 1
Set r = oDoc.Range( _
Start:=oDoc.Paragraphs(3).Range.Start, _
End:=oDoc.Range.End)

and set the ActiveDocument as an object to pass as a parameter
Sub AllFiles()
Dim file
Dim path As String
Dim oDoc As Document
path = the path To your folder To be processed
' INCLUDE the "\" at the end!!!!

file = Dir(path & "*.doc") ' assuming doc files
' this allows processing for ALL .doc files
' in the path folder

Do While file <> ""
Set oDoc = Documents.Open path & file
Call MySentence(oDoc)
Set Odoc = Nothing
file = Dir()
Loop
End Sub
The object oDoc is opened by AllFiles, saved and closed by MySentences, and then the object is destroyed by AllFiles.

eike33
09-21-2010, 10:55 AM
Thanks again - just one more issue!

I get an error and when I debug it says

"Compile error

Syntax error" for this line of script:

Set oDoc = Documents.Open path & file

Thanks!

fumei
09-22-2010, 11:22 AM
Did you set the variable path correctly?

eike33
09-22-2010, 01:07 PM
Can you explain to me the proper formatting of this section of code with an example?

Do While file <> ""
Set oDoc = Documents.Open ("c:\zzz\Kennedy\Speech_1_1961_00" & i & ".txt")
Call MySentence(oDoc)
Set oDoc = Nothing
file = Dir()


What should the variable path look like, if inputted correctly?

fumei
09-24-2010, 09:59 AM
Here it is again.
Sub AllFiles()
Dim file
Dim path As String
Dim oDoc As Document
path = the path To your folder To be processed
' INCLUDE the "\" at the end!!!!

file = Dir(path & "*.doc") ' assuming doc files
' this allows processing for ALL .doc files
' in the path folder

Do While file <> ""
Set oDoc = Documents.Open path & file
Call MySentence(oDoc)
Set Odoc = Nothing
file = Dir()
Loop
End Sub

eike33
09-28-2010, 03:33 PM
Hi again,

I still get an error that highlights "Call MySentences" and the error reads:

"Compile error

Sub or function not defined"


What can I do to solve this? Thanks so much!!!

fumei
09-28-2010, 03:42 PM
Post the entire code you are using. I am sure we can adjust things so it will work. There is some syntax incorrect, or missing.

eike33
09-28-2010, 03:51 PM
Hi again,

Ok I've gotten the code to work for a single file (the specific file is titled 1981_11_1_1, but am having a hard time making it work for all of the files in a given folder. Here is the current code:

Sub MySentences(oDoc As Document)
Dim r As Range
Dim oSentence As Object
Dim i As Long
i = 1
Set r = oDoc.Range( _
Start:=oDoc.Paragraphs(3).Range.Start, _
End:=oDoc.Range.End)
' this sets a range from the third paragraphs to the end of doc.

For Each oSentence In r.Sentences
Documents.Add
With ActiveDocument
.Range = oSentence
.SaveAs FileName:="C:\Users\zzz\Documents\Test macro1\1981_11_1_1_" & i & ".txt", _
FileFormat:=wdFormatText, Encoding:=1252, _
InsertLineBreaks:=False, LineEnding:=wdCRLF
.Close
End With
i = i + 1
Next
End Sub

Sub AllFiles()
Dim file
Dim path As String
Dim oDoc As Document
path = "C:\Users\zzz\Documents\Test macro1\"
' INCLUDE the "\" at the end!!!!

file = Dir(path & "*.txt") ' assuming doc files
' this allows processing for ALL .txt files
' in the path folder

Do While file <> ""
Set oDoc = Documents.Open("C:\Users\zzz\Documents\Test macro1\1981_11_1_1" & i & ".txt")
Call MySentences(oDoc)
Set oDoc = Nothing
file = Dir()
Loop
End Sub




Where I'm having difficulty is in this part:

Set oDoc = Documents.Open("C:\Users\zzz\Documents\Test macro1\1981_11_1_1" & i & ".txt")


The files in this folder don't share a specific root. Is there a way to tell the code to take EVERY file in the folder and apply this logic of saving each sentence as a new file name, with a sentence number tag? So in essence each file has a unique name, and I want to take the unique name and just add a tag to it to indicate sentence number for each new file created. For example, for the file named "1981_11_1_1" (which is in the current code), each newly created sentence file is named "1981_11_1_1_1", then "1981_11_1_1_2", then "1981_11_1_1_3", so on. The next file in the folder is named "1984_32_5_1", and I want to have the same process occur for it... so on and so forth. Would this be possible? :think:

Thanks again!!

fumei
09-28-2010, 04:07 PM
Where I'm having difficulty is in this part:

Set oDoc = Documents.Open("C:\Users\zzz\Documents\Test macro1\1981_11_1_1" & i & ".txt")

No doubt. Look, your original question (sort of) was:

Also, how can I do this for a large set of files within a folder on my computer? Is there a way to run this script on all of the files within a folder?

The answer is, as given, Yes. I gave code for using the Dir function.

Set oDoc = Documents.Open path & file


Not a specific file: 1981_11_1_1" & i & ".txt"

But EACH file in the folder.

"The files in this folder don't share a specific root." If you mean what I mean by "root", oh yes they do.

"Is there a way to tell the code to take EVERY file in the folder and apply this logic of saving each sentence as a new file name,"

Yes, with what I posted. The Dir function. Why did you change it?
file = Dir(path & "*.txt") ' assuming doc files
' this allows processing for ALL .txt files
' in the path folder

Do While file <> ""
Set oDoc = Documents.Open("C:\Users\zzz\Documents\Test macro1\1981_11_1_1" & i & ".txt")
Call MySentences(oDoc)
Set oDoc = Nothing
file = Dir()
Loop


' this allows processing for ALL .txt files
' in the path folder

ALL text files. Why are you naming things?

fumei
09-28-2010, 04:10 PM
PLUS......

You are using:

C:\Users\zzz\Documents\Test macro1\1981_11_1_1" & i & ".txt

but, guess what? Your variable i is:

out of scope (as it is declared)
and never changes in that MySentences always starts off with i = 1. So using i in a filename - in another procedure - should fail.

eike33
09-28-2010, 04:18 PM
Hi again,

I changed it because I got:

"Compile error

Syntax error" for this line of script:

Set oDoc = Documents.Open path & file


You had responded to this question by asking me if I had set the variable path correctly, so I for some reason assumed that was the variable path and changed it! Sorry about this! Could you clarify the solution to that compile error/syntax error problem?

fumei
09-29-2010, 10:34 AM
Change:

Set oDoc = Documents.Open path & file

to

Set oDoc = Documents.Open (path & file)