PDA

View Full Version : VBA code to change a long list of messages into a table (please help, happy to pay)



hubs
01-28-2017, 05:09 AM
I have a Microsoft Word Document which is 7,000 pages long. It is a huge transcript of messages that are exchanged between two people. I need help to place all the messages in the 7000 pages this into a three column table

The word document is all in the same format as shown below with "-----" between each message. They are sequential like this, but the "from jack" and "to jack" are not sequential i.e. there may be several messages "to jack" and no messages "from jack". The actual messages may range from one word to several sentences.

----------------------------------------------------
01/07/2014, 10:48 from Jack
Hello
----------------------------------------------------
01/07/2014, 10:58 to Jack
How are you
----------------------------------------------------
01/07/2014, 11:18 to Jack
Hello?
----------------------------------------------------
01/07/2014, 11:20 from Jack
Good, but the weather is really bad, I am not too sure what I will be doing later on today
----------------------------------------------------

etc etc etc

I want to create a table into three columns which are;



from Jack
01/07/2014, 10:48
Hello


to Jack
01/07/2014, 10:58
How are you


to Jack
01/07/2014, 11:18
Hello?


from Jack
01/07/2014, 11:20
Good, but the weather is really bad, I am not too sure what I will be doing later on today




etc

Sabuncu
01-28-2017, 11:50 AM
How do you need this program? 1) As a macro that runs in Word, or 2) as an executable that will read the Word document in question, and produce another Word document in the new (i.e. table) format you have specified? Because of the huge document size, I suggest the latter option. I know from firsthand experience that Word macros take an incredibly long time to run on large documents, and during the processing, your Word application is locked up, without ability to intervene.

Please respond; thanks.

Kilroy
01-28-2017, 04:47 PM
post a sample page

macropod
01-28-2017, 10:02 PM
How do you need this program? 1) As a macro that runs in Word, or 2) as an executable that will read the Word document in question, and produce another Word document in the new (i.e. table) format you have specified? Because of the huge document size, I suggest the latter option.
That is all quite unnecessary and a macro (also unnecessary) to implement the solution I'm posting below could do the whole lot very quickly - far more so than creating an executable program for it.

post a sample pagehubs already posted some sample data, which is ample.

hubs: This is quite easy to do if all your data are laid out as per your samples (I'm assuming the ---------------------------------------------------- lines don't actually exist, but a simple Find/Replace could delete them if they do). All you'd need to prepare the data is a wildcard Find/Replace, where:
Find = ([0-9/]{10}, [0-9:]{5}) ([!^13^l]@)[^13^l]([!^13^l]@[^13^l])
Replace = \2^t\1^t\3
then select the lot (e.g. Ctrl-A) and choose Insert|Table>Convert text to Table>AutoFit to Contents>OK.

gmayor
01-28-2017, 10:27 PM
If the examples are fully representative, I would have approached this in a similar manner to Paul, but it all falls apart if one or more of the message body texts, for the third cell in the rows, comprise more than one paragraph.

macropod
01-28-2017, 10:43 PM
it all falls apart if one or more of the message body texts, for the third cell in the rows, comprise more than one paragraph.
Even that can be worked around, provided no new paragraph within a message starts with a date.

gmaxey
01-29-2017, 09:54 AM
Paul,

You are far better as FR manipulations than I am and given a 7,000 page document, this may take awhile. However, based on the example given something like this may work and provide for dates (but not a date\time string) at the start of paragraphs within a message body.


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim oRng As Word.Range
Dim oDoc As Document
Set oRng = ActiveDocument.Range
'Kill the "---" delimiters if they exist.
With oRng.Find
.Text = "-@[^13^l]"
.Replacement.Text = vbNullString
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
Set oRng = ActiveDocument.Range
'Create temporary general delimiters before unique date\time delimiters
With oRng.Find
.Text = "([0-9/]{10}, [0-9:]{5})"
.Replacement.Text = "~!~\1"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
Set oRng = ActiveDocument.Range
With oRng.Find
'Swap position date\time ~ to\from and kill leading space before to\from _
and create tab delimiters.
.Text = "(~\!~)([0-9/]{10}, [0-9:]{5}) ([!^13^l]@)([^13^l])" 'Added space between group 2 $ 3
.Replacement.Text = "~!~\3^t\2^t"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
Set oRng = ActiveDocument.Range
With oRng.Find
'Replace paragraphs marks and line breaks between general delimiters with temporary placeholders.
.Text = "(~\!~)(*)(~\!~)"
.MatchWildcards = True
While .Execute
With oRng
.End = .End - 3
.Characters.Last = vbCr
.Text = Replace(oRng.Text, Chr(11), "~&~")
.Text = Replace(oRng.Text, Chr(13), "~$~")
.Text = Replace(oRng.Text, "~!~", "")
.Collapse wdCollapseEnd
End With
Wend
Set oRng = ActiveDocument.Range
'Kill the general delimiters.
oRng.Text = Replace(oRng.Text, "~!~", "")
End With
'Create table in new document.
Set oRng = ActiveDocument.Range
Set oDoc = Documents.Add
oDoc.Range.Text = oRng.Text
oDoc.Range.ConvertToTable Separator:=wdSeparateByTabs
Set oRng = oDoc.Range
'Restore original paragraphs\linebreaks
With oRng.Find
.Text = "~&~"
.Replacement.Text = Chr(11)
.Execute Replace:=wdReplaceAll
End With
Set oRng = oDoc.Range
With oRng.Find
.Text = "~$~"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll
End With
lbl_Exit:
Exit Sub
End Sub

Kilroy
01-29-2017, 03:22 PM
I can't figure out how to remove everything from the first comma forward in To/From column to the comments column. I ran this 1000 pages took forever. I know the "Call" for each slows it down but I'm still not sure how to do it without it..


Sub RunAll()
Call ReplaceMLBwithPM
Call RemoveDash
Call ReplaceJackHRWithJack
Call Replace2returns
Call ReplaceNumberspace
Call CreateTableFromParagraphs
Call centertextrow1
Call BorderHighLight
Call linespacing
Call centertextvertically
End Sub
Sub ReplaceMLBwithPM()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^l"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = False
.MatchFuzzy = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub RemoveDash()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "-"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub ReplaceJackHRWithJack()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "Jack^p"
.Replacement.Text = "Jack, "
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub Replace2returns()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub ReplaceNumberspace()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "([0-9\)]), "
.Replacement.Text = "\1-"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub CreateTableFromParagraphs()
Dim oSource As Document, oTarget As Document
Dim oTbl As Table
Dim oPar As Paragraph
Dim oRng As range, oCellRng As range
Dim lngRow As Long, lngIndex As Long
Dim oCell As Cell
Dim arrParts() As String
Dim bUseSpace As Boolean
Dim oDoc As Document
Set oSource = ActiveDocument
oSource.ConvertNumbersToText
With oSource.range.Find
.Text = "^p^w"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll
End With
Set oTarget = Documents.Add
Set oTbl = oTarget.Tables.Add(oTarget.range, 1, 3)
With oTbl
.Columns(1).SetWidth ColumnWidth:=100, _
RulerStyle:=wdAdjustFirstColumn
.Columns(2).SetWidth ColumnWidth:=100, _
RulerStyle:=wdAdjustFirstColumn
.Columns(3).SetWidth ColumnWidth:=300, _
RulerStyle:=wdAdjustFirstColumn
.Cell(1, 1).range.Text = "Date"
.Cell(1, 2).range.Text = "To/From"
.Cell(1, 3).range.Text = "Comments"
End With
For lngRow = 1 To oSource.Paragraphs.Count
bUseSpace = False
Set oPar = oSource.Paragraphs(lngRow)
If Len(oPar.range.Text) > 1 Then
oTbl.Rows.Add
For Each oCell In oTbl.Rows.Last.Cells
oCell.range.Style = "Normal"
Next
If oPar.range.ListFormat.ListType <> wdListNoNumbering Then
oPar.range.ListFormat.ConvertNumbersToText
End If
lngIndex = 1
If InStr(oPar.range.Text, vbTab) > 0 Then
Do Until oPar.range.Characters(lngIndex) <> vbTab
lngIndex = lngIndex + 1
Loop
End If
Select Case True
Case IsNumeric(oPar.range.Characters(lngIndex))
Set oRng = oPar.range
arrParts = Split(oPar.range.Text, Chr(9))
If UBound(arrParts) = 0 Then
bUseSpace = True
arrParts = Split(oPar.range.Text, Chr(32))
End If
oTbl.Cell(oTbl.Rows.Count, 1).range.Text = Trim(arrParts(0))
oTbl.Cell(oTbl.Rows.Count, 1).range.ParagraphFormat.Alignment = wdAlignParagraphRight
oRng.End = oRng.End - 1
If bUseSpace Then
oRng.MoveStartUntil Cset:=Chr(32)
Else
oRng.MoveStartUntil Cset:=vbTab
End If
oRng.Start = oRng.Start + 1
oTbl.Cell(oTbl.Rows.Count, 2).range.FormattedText = oRng.FormattedText
Case Else
Set oRng = oPar.range
oRng.End = oRng.End - 1
oTbl.Cell(oTbl.Rows.Count, 2).range.FormattedText = oRng.FormattedText
End Select
DoEvents
End If
Next lngRow
lbl_Exit:
Exit Sub
End Sub
Sub centertextrow1()
With Selection.Tables(1)
For i = 1 To 1
.Rows(i).Select
Selection.ParagraphFormat.Alignment = wdAlignParagraphCenter
Selection.Tables(1).Rows.Alignment = wdAlignRowCenter
Selection.Cells.VerticalAlignment = wdCellAlignVerticalCenter
Selection.Shading.ForegroundPatternColor = wdColorAutomatic
Selection.Shading.BackgroundPatternColor = -603923969
Next
End With
End Sub
Sub BorderHighLight()
Selection.WholeStory
With Selection.Borders(wdBorderTop)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderLeft)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderBottom)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderRight)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderHorizontal)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderVertical)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
End Sub
Sub linespacing()
Selection.WholeStory
With Selection.ParagraphFormat
.SpaceBefore = 2
.SpaceBeforeAuto = False
.SpaceAfter = 2
.SpaceAfterAuto = False
.LineSpacingRule = wdLineSpaceSingle
.LineUnitBefore = 0
.LineUnitAfter = 0
End With
End Sub
Sub centertextvertically()
Selection.Tables(1).Select
Selection.Cells.VerticalAlignment = wdCellAlignVerticalCenter
End Sub

Kilroy
01-29-2017, 10:43 PM
Disregard last post. I should have refreshed my page before posting. This one incorporates Greg's post and I added some formatting. works Great up to 100 pages, Very slow 500 pages, computer starts to smoke at 1000, wouldn't try 7000.


Sub RunAll()
Call RemoveAndReplace
Call ScratchMacro
Call centertextrow1
Call BorderHighLight
Call centertextvertically
Call Replace
Call DeleteEmptyRows
End Sub
Sub RemoveAndReplace()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "-"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^l"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^l^l"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey Slightly Modified by Kilroy
Dim oRng As Word.range
Dim oDoc As Document

Set oRng = ActiveDocument.range
'Create temporary general delimiters before unique date\time delimiters
With oRng.Find
.Text = "([0-9/]{10}, [0-9:]{5})"
.Replacement.Text = "~!~\1"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
Set oRng = ActiveDocument.range
With oRng.Find
'Swap position date\time ~ to\from and kill leading space before to\from _
And create tab delimiters.
.Text = "(~\!~)([0-9/]{10}, [0-9:]{5}) ([!^13^l]@)([^13^l])" 'Added space between group 2 $ 3
.Replacement.Text = "~!~\3^t\2^t"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With

'Create table in new document.
Set oRng = ActiveDocument.range
Set oDoc = Documents.Add
oDoc.range.Text = oRng.Text
oDoc.range.ConvertToTable Separator:=wdSeparateByTabs
Set oRng = oDoc.range
'Restore original paragraphs\linebreaks
With oRng.Find
.Text = "~&~"
.Replacement.Text = Chr(11)
.Execute Replace:=wdReplaceAll
End With
Set oRng = oDoc.range
With oRng.Find
.Text = "~$~"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll
End With
lbl_Exit:
Exit Sub
End Sub
Sub centertextrow1()
With Selection.Tables(1)
Selection.InsertRowsAbove 1
For i = 1 To 1
.Cell(1, 1).range.Text = "To/From"
.Cell(1, 2).range.Text = "Date"
.Cell(1, 3).range.Text = "Comments"
.Rows(i).Select
.Columns(1).SetWidth ColumnWidth:=100, _
RulerStyle:=wdAdjustFirstColumn
.Columns(2).SetWidth ColumnWidth:=100, _
RulerStyle:=wdAdjustFirstColumn
.Columns(3).SetWidth ColumnWidth:=300, _
RulerStyle:=wdAdjustFirstColumn
Selection.ParagraphFormat.Alignment = wdAlignParagraphCenter
Selection.Tables(1).Rows.Alignment = wdAlignRowCenter
Selection.Cells.VerticalAlignment = wdCellAlignVerticalCenter
Selection.Shading.ForegroundPatternColor = wdColorAutomatic
Selection.Shading.BackgroundPatternColor = -603923969
Next
End With
End Sub
Sub BorderHighLight()
Selection.WholeStory
With Selection.Borders(wdBorderTop)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderLeft)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderBottom)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderRight)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderHorizontal)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.Borders(wdBorderVertical)
.LineStyle = Options.DefaultBorderLineStyle
.LineWidth = Options.DefaultBorderLineWidth
.Color = Options.DefaultBorderColor
End With
With Selection.ParagraphFormat
.SpaceBefore = 2
.SpaceBeforeAuto = False
.SpaceAfter = 2
.SpaceAfterAuto = False
.LineSpacingRule = wdLineSpaceSingle
.LineUnitBefore = 0
.LineUnitAfter = 0
End With
End Sub
Sub centertextvertically()
Selection.Tables(1).Select
Selection.Cells.VerticalAlignment = wdCellAlignVerticalCenter
With Selection.Tables(1)
For i = 1 To 1
.Rows(i).Select
Selection.Font.Size = 10
Selection.Font.Bold = wdToggle
Selection.Rows.HeadingFormat = wdToggle
Next
End With
End Sub
Sub Replace()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^l"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "~!~"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Sub DeleteEmptyRows()
'A basic Word macro coded by Greg Maxey
Dim oTbl As Table
Dim lngIndex As Long
Set oTbl = Selection.Tables(1)
For lngIndex = oTbl.Rows.Count To 1 Step -1
'An empty cell contains a end of cell marker which has a length = 2. So,
' Change the three to a 1 in the next line to change the column(lngIndex, 3)
If Len(oTbl.Cell(lngIndex, 2).range) = 2 And Len(oTbl.Cell(lngIndex, 3).range) = 2 Then
oTbl.Rows(lngIndex).Delete
End If
Next
lbl_Exit:
Exit Sub
End Sub
Sub PageNumbers()
Dim oRng As range
Set oRng = ActiveDocument.Sections(1).Footers(wdHeaderFooterPrimary).range
On Error GoTo Err_Handler
Application.Templates(Environ("APPDATA") & "\Microsoft\Document Building Blocks\1033\" & _
Val(Application.Version) & _
"\Built-In Building Blocks.dotx").BuildingBlockEntries("Bold Numbers 3").Insert _
Where:=oRng, RichText:=True
lbl_Exit:
Exit Sub
Err_Handler:
Templates.LoadBuildingBlocks
Resume
End Sub

macropod
01-30-2017, 01:13 AM
This one .... works Great up to 100 pages, Very slow 500 pages, computer starts to smoke at 1000, wouldn't try 7000.
That's what you get for using code produced by the macro recorder on such a long document; it's very inefficient. Besides which a great deal of it may be quite unnecessary. Based on the sample data hubs posted, all that might be needed is:

Sub Demo()
Application.ScreenUpdating = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = True
.Text = "[\-]{5,}@[^13^l]"
.Replacement.Text = ""
.Execute Replace:=wdReplaceAll
.Text = "([0-9/]{10}, [0-9:]{5}) ([!^13^l]@)[^13^l]([!^13^l]@[^13^l])"
.Replacement.Text = "\2^t\1^t\3"
.Execute Replace:=wdReplaceAll
End With
.InsertBefore "To/From" & vbTab & "Date" & vbTab & "Message" & vbCr
.ConvertToTable Separator:=vbTab, NumColumns:=3, Format:=wdTableFormatGrid1, _
ApplyHeadingRows:=True, AutoFit:=True, AutoFitBehavior:=wdAutoFitWindow
End With
Application.ScreenUpdating = True
End Sub