PDA

View Full Version : Convert Text to cells Sentences by Punctuation Marks



icemail
08-19-2020, 10:22 AM
Hello
I want to divide the texts I have into sentences according to punctuation marks. Each sentence will be written in a separate cell. For example, these are punctuation marks (.) (…) (!) (?). But I can add and remove later. And "TextToColumns" not working good.

text before
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse eget felis eget elit euismod dignissim a a felis... Ut imperdiet justo condimentum risus consequat facilisis? In hac habitasse platea dictumst! Integer eget ex faucibus, varius libero in, rhoncus mauris. Sed a posuere purus... Maecenas tincidunt enim sodales, blandit nunc a, pellentesque lorem. Proin dapibus ultricies ultricies. In pellentesque lectus quis dolor ultrices ultrices.

text after
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Suspendisse eget felis eget elit euismod dignissim a a felis...
Ut imperdiet justo condimentum risus consequat facilisis?
In hac habitasse platea dictumst!
Integer eget ex faucibus, varius libero in, rhoncus mauris.
Sed a posuere purus...
Maecenas tincidunt enim sodales, blandit nunc a, pellentesque lorem.
Proin dapibus ultricies ultricies.
In pellentesque lectus quis dolor ultrices ultrices.

p45cal
08-20-2020, 11:39 AM
Select the cell(s) with the original text (I've assumed they're in one column) then run this macro. New 'sentences' will appear in the cells to the right, filling as many cells as needed, but keeping the original cell(s) intact.

Sub blah()
For Each cll In Selection.Cells
bt = Application.Trim(cll.Value)
at = Replace(Replace(Replace(Replace(bt, "...", "…¬"), ".", ".¬"), "?", "?¬"), "!", "!¬")
at2 = Split(at, "¬")
For i = LBound(at2) To UBound(at2)
at2(i) = Application.Trim(at2(i))
at2(i) = Replace(at2(i), "…", "...") 'optional to replace an ellipsis with 3 dots.
If Len(at2(i)) = 0 Then at2(i) = "¬"
Next i
at3 = Filter(at2, "¬", False)
cll.Offset(, 1).Resize(, UBound(at3) - LBound(at3) + 1).Value = at3
Next cll
End Sub

snb
08-21-2020, 12:49 AM
Sub M_snb()
sn = Filter(Split(Replace(Replace(Cells(1), "!", "!."), "?", "?."), "."), " ")
Cells(1, 2).Resize(UBound(sn)) = Application.Transpose(sn)
End Sub

icemail
08-21-2020, 06:15 AM
Sub M_snb()
sn = Filter(Split(Replace(Replace(Cells(1), "!", "!."), "?", "?."), "."), " ")
Cells(1, 2).Resize(UBound(sn)) = Application.Transpose(sn)
End Sub
Is it possible to get the data from the text file? Thank you for transpose idea. But this macro removing "Punctuation Marks". I want to keep them.



Select the cell(s) with the original text (I've assumed they're in one column) then run this macro. New 'sentences' will appear in the cells to the right, filling as many cells as needed, but keeping the original cell(s) intact.

Sub blah()
For Each cll In Selection.Cells
bt = Application.Trim(cll.Value)
at = Replace(Replace(Replace(Replace(bt, "...", "…¬"), ".", ".¬"), "?", "?¬"), "!", "!¬")
at2 = Split(at, "¬")
For i = LBound(at2) To UBound(at2)
at2(i) = Application.Trim(at2(i))
at2(i) = Replace(at2(i), "…", "...") 'optional to replace an ellipsis with 3 dots.
If Len(at2(i)) = 0 Then at2(i) = "¬"
Next i
at3 = Filter(at2, "¬", False)
cll.Offset(, 1).Resize(, UBound(at3) - LBound(at3) + 1).Value = at3
Next cll
End Sub
Is it possible to get the data from the text file? And how can we add "transpose"?

p45cal
08-21-2020, 08:42 AM
Supply the text file and describe how you want the sentences arranged (especially if there are multiple original strings to split).

icemail
08-22-2020, 05:51 AM
The rules are still the same sentences will be split according to punctuation marks (.) (…) (!) (?)
It works fine when it comes to splitting your macro sentences.
In addition, reading the data from the text file
and
I need it to be written to column A in the excel file. (A1 to A1048576)

p45cal
08-22-2020, 06:52 AM
That's an Excel workbook, not a text file, and it contains the results.
Looking for the text file you were talking about.

icemail
08-22-2020, 07:24 AM
I cant upload text file here and I upload to another site
Txt file (http://www.mediafire.com/file/n4lwfit8x4pdsvc/dummy+sentences.txt)

snb
08-23-2020, 03:35 AM
You can zip every txt-file.

SamT
08-23-2020, 10:05 AM
Renamed: dummy sentences.txt.zip (454 Bytes)

Zipped: dummy sentences.zip (444 Bytes)

Contents:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse eget felis eget elit euismod dignissim a a felis... Ut imperdiet justo condimentum risus consequat facilisis? In hac habitasse platea dictumst! Integer eget ex faucibus, varius libero in, rhoncus mauris. Sed a posuere purus... Maecenas tincidunt enim sodales, blandit nunc a, pellentesque lorem. Proin dapibus ultricies ultricies. In pellentesque lectus quis dolor ultrices ultrices.

Tom Jones
08-23-2020, 11:38 AM
Sub M_snb()
sn = Filter(Split(Replace(Replace(Cells(1), "!", "!."), "?", "?."), "."), " ")
Cells(1, 2).Resize(UBound(sn)) = Application.Transpose(sn)
End Sub


snb,

The sentences from row 2 down, have a space before the sentence, and the sentences no longer have the punctuation marks at the end of the sentence.

p45cal
08-23-2020, 04:35 PM
From a file:
Sub blah()
Set Destn = ActiveSheet.Cells(1)
Set ts = CreateObject("Scripting.FileSystemObject").opentextfile("C:\Users\Public\Documents\dummy sentences.txt")
bt = Application.Trim(ts.readall)
ts.Close
at = Replace(Replace(Replace(Replace(bt, "...", "…¬"), ".", ".¬"), "?", "?¬"), "!", "!¬")
at2 = Split(at, "¬")
For i = 0 To UBound(at2)
at2(i) = Application.Trim(at2(i))
at2(i) = Replace(at2(i), "…", "...") 'optional to replace an ellipsis with 3 dots.
If Len(at2(i)) = 0 Then at2(i) = "¬"
Next i
at3 = Filter(at2, "¬", False)
Destn.Resize(UBound(at3) + 1).Value = Application.Transpose(at3)
'Set Destn = Destn.Offset(UBound(at3))
End Sub
There's only one line in the text file you supplied so I don't know if it will work if there are more lines.

snb
08-24-2020, 12:44 AM
That's why you have do something yourself too: adapt the code. I am not so interested in a desrciption of the results of my macro, becasue I was familiar with those before I posted it. You'd better provide the necessary information we asked for in this thread.

icemail
08-27-2020, 09:28 AM
From a file:
Sub blah()
Set Destn = ActiveSheet.Cells(1)
Set ts = CreateObject("Scripting.FileSystemObject").opentextfile("C:\Users\Public\Documents\dummy sentences.txt")
bt = Application.Trim(ts.readall)
ts.Close
at = Replace(Replace(Replace(Replace(bt, "...", "…¬"), ".", ".¬"), "?", "?¬"), "!", "!¬")
at2 = Split(at, "¬")
For i = 0 To UBound(at2)
at2(i) = Application.Trim(at2(i))
at2(i) = Replace(at2(i), "…", "...") 'optional to replace an ellipsis with 3 dots.
If Len(at2(i)) = 0 Then at2(i) = "¬"
Next i
at3 = Filter(at2, "¬", False)
Destn.Resize(UBound(at3) + 1).Value = Application.Transpose(at3)
'Set Destn = Destn.Offset(UBound(at3))
End Sub
There's only one line in the text file you supplied so I don't know if it will work if there are more lines.
Hello this code working well with small datas. But its giving error with big datas. i share another text file for example for big datas. İ hope you can help me. Thank you.

p45cal
08-27-2020, 10:14 AM
try:
Sub blah()
Dim at3() As String
Set Destn = ActiveSheet.Cells(1)
Set ts = CreateObject("Scripting.FileSystemObject").opentextfile("C:\Users\Public\Documents\dummy sentences.txt")
ct = ts.readall
ts.Close
dt = Split(ct, vbCrLf)
j = 0
For Each bt In dt
at = Replace(Replace(Replace(Replace(bt, "...", "…¬"), ".", ".¬"), "?", "?¬"), "!", "!¬")
at2 = Split(at, "¬")
For i = 0 To UBound(at2)
at2(i) = Application.Trim(at2(i))
at2(i) = Replace(at2(i), "…", "...") 'optional to replace an ellipsis with 3 dots.
If Len(at2(i)) = 0 Then at2(i) = "¬"
ReDim Preserve at3(0 To j)
at3(j) = at2(i)
j = j + 1
Next i
Next bt
at3 = Filter(at3, "¬", False)
Destn.Resize(UBound(at3) + 1).Value = Application.Transpose(at3)
End Sub

icemail
08-27-2020, 11:33 AM
try:
Sub blah()
Dim at3() As String
Set Destn = ActiveSheet.Cells(1)
Set ts = CreateObject("Scripting.FileSystemObject").opentextfile("C:\Users\Public\Documents\dummy sentences.txt")
ct = ts.readall
ts.Close
dt = Split(ct, vbCrLf)
j = 0
For Each bt In dt
at = Replace(Replace(Replace(Replace(bt, "...", "…¬"), ".", ".¬"), "?", "?¬"), "!", "!¬")
at2 = Split(at, "¬")
For i = 0 To UBound(at2)
at2(i) = Application.Trim(at2(i))
at2(i) = Replace(at2(i), "…", "...") 'optional to replace an ellipsis with 3 dots.
If Len(at2(i)) = 0 Then at2(i) = "¬"
ReDim Preserve at3(0 To j)
at3(j) = at2(i)
j = j + 1
Next i
Next bt
at3 = Filter(at3, "¬", False)
Destn.Resize(UBound(at3) + 1).Value = Application.Transpose(at3)
End Sub

This macro working thank you + rep

and last thing there is some character errors like this. And thay need replace. Can we handle? example

Text
—“I couldn’t, and didn’t, put it down until I’d read every last word.”
Result
—I couldn’t, and didn’t, put it down until I’d read every last word.

— = -
“ = "
” = "
’ = '

p45cal
08-27-2020, 04:48 PM
Put a small text file together with examples including all the extra characters included in several lines and also post a workbook of how you expect this small text file to appear after processing.

Tom Jones
08-27-2020, 11:51 PM
Hi,


Text
—“I couldn’t, and didn’t, put it down until I’d read every last word.”
Result
—I couldn’t, and didn’t, put it down until I’d read every last word.

— = -
“ = "
” = "
’ = '

I think those characters appear due to the font used, and / or certain special characters of a regional font.

I tested both VBA codes and both work perfectly.
With both TEXT files.

icemail
08-28-2020, 03:14 AM
I upload sample files about characters.
Thank you

p45cal
08-28-2020, 05:58 AM
The text file seems to be encoded as 65001 (unicode (UTF-8)).
For the moment, in the attached, is an alternative solution, using Power Query. As it is I've left it looking at the most recent file you attached. Before it looks at your text file on your system, you need to point the query at that file.
This picture attempts to show you what you need to do:
27017
I've also attached the picture as a zip file in case the resolution's not good enough.
After that has been done, you only need to right-click the table and choose Refresh which will update the table according to what's in the text file on your system.

I will still try and tweak the macro solution to read this 65001-encoded text file - but I'm out of time just now.

I'm interested, where does this text file come from?

p45cal
08-28-2020, 06:31 AM
I will still try and tweak the macro solution to read this 65001-encoded text file - but I'm out of time just now.
It was easier than expected:
Sub blah2()
Dim at3() As String, objStream
With CreateObject("ADODB.Stream")
.Charset = "utf-8"
.Open
.LoadFromFile ("C:\Users\Public\Documents\dummy sentences.txt")
ct = .ReadText()
.Close
End With

Set Destn = ActiveSheet.Cells(1)
dt = Split(ct, vbCrLf)
j = 0
For Each bt In dt
at = Replace(Replace(Replace(Replace(bt, "...", "…¬"), ".", ".¬"), "?", "?¬"), "!", "!¬")
at2 = Split(at, "¬")
For i = 0 To UBound(at2)
at2(i) = Application.Trim(at2(i))
at2(i) = Replace(at2(i), "…", "...") 'optional to replace an ellipsis with 3 dots.
If Len(at2(i)) = 0 Then at2(i) = "¬"
ReDim Preserve at3(0 To j)
at3(j) = at2(i)
j = j + 1
Next i
Next bt
at3 = Filter(at3, "¬", False)
Destn.Resize(UBound(at3) + 1).Value = Application.Transpose(at3)
End Sub

Still interested in where these text files come from…

p45cal
09-01-2020, 06:34 AM
Still interested in where these text files come from…