PDA

View Full Version : Solved: Make one line from many lines



James Niven
11-27-2009, 10:00 AM
Hi All,

I have a word file that has a few lines per entry. I wish to convert it all to a table format.

Here is my data I am presented with:
AL Boligee 107.3 NEW AF 6kw/47m, 32-46-
33/88-03-02 (Mildred
R. Porter)
As you can see, it's over three lines, sometimes more than 3.

Here is the desired output I wish in a table format:

AL Boligee 107.3 NEW AF 6kw/47m, 32-46-33/88-03-02 (Mildred R. Porter)
I have attached one months worth of data, the highlighted piece in yellow is the deisred output and the location for the columns of the table.

The beginning of each entry starts with a US/Candian State abbreviation.

Any assistance will be appreciated, doing this manually is not the way to go.

Thanks

James Niven
Cedar Creek, TX

Tinbendr
11-28-2009, 08:30 PM
James,
We already worked this out here (http://www.vbaexpress.com/forum/showthread.php?t=28580&highlight=radio). Why the change in program? Can you not just save the Excel data as CSV file? Then open with Word and convert to table?

James Niven
12-03-2009, 06:50 AM
Hi Tinbendr,

Sure the format is very similar to the previous solution that was presented. Is it easier to solve this issue in Excel rather than word?

I want to remove the trailing sentences from below each main entry and make one sentence as indicated in in the yellow, I hope that makes sense, then I can run the other code to separte into columns.

Thanks

James Niven

Tinbendr
12-03-2009, 03:08 PM
I finally noticed this later.

I've been working on this on and off, but no success yet.

I'll keep trying.

Tinbendr
12-08-2009, 10:45 AM
Here's my solution.

Written in two subs. One replaces the vbcr and the other replaces the spaces with tabs.

You'll have to proof it between subs as the text you provided doesn't always provide consistant patterns.

TonyJollans
12-08-2009, 01:57 PM
Just for fun, I tried to do this with Find and Replace. The best I could do was:

1. Pick a text string that does not otherwise exist in the file (I chose "ChessForZebras")
2. Append this string to the end of the file

3. Press Ctrl+H to open up the Replace dialog
4. Check the "Use wildcards" checkbox (press "More" first, to see it if it is not visible)

5. In the Find what box, enter: ^13([A-Z]{2} [A-Z])
In the Replace with box, enter: ChessForZebras\1
Press "Replace All"

6. In the Find what box, enter: ^13
In the Replace with box, enter: (a single space)
Press "Replace All"

7. In the Find what box, enter: ([A-Z]{2}) (*) ([0-9]{1,3}.[0-9]) ([! ]@) (*)ChessForZebras
In the Replace with box, enter: \1^t\2^t\3^t\4^t\5^p
Press "Replace All"

It works with the posted Test File - but does not explicitly identify State abbreviations, so could mess up with the occasional record.

James Niven
12-08-2009, 06:37 PM
Tinbendr,

Thanks for your reply and solution. I tried to run your code "Parse Radio 2 Step one" and receive the following error message as "Wrong number of arguments of invalid property assignment" on this part of the line highlighted in red.

'Application.StatusBar = "PARAGRAPH NUMBER - " & A
Loop Until A >= TotalPara
bDoc.Save ThisDocument.Path & "ParseText.doc"

I am running Office 2007, not too sure if this is the issue.

James

geekgirlau
12-08-2009, 10:02 PM
Do you need to use "SaveAs" rather than "Save"?

Tinbendr
12-09-2009, 12:44 PM
Yep, GeekGirl has part of the answer! (Probably thinking to hard about it at the time.) The other is that I left the backslash off.

bDoc.SaveAs ThisDocument.Path & "\ParseText.doc"

BYW, did you try Tony's solution. Much faster than mine. Makes my head swim trying to read through it.

fumei
12-09-2009, 12:50 PM
Yeah, using wildcards like that can be a bit mind numbing. I must admit I still do not have a real handle on it. I always end up with a pool of drool on my lap when I read Tony's suggestions on wildcards...my brain gets turned into mush.

TonyJollans
12-09-2009, 01:34 PM
my brain gets turned into mush

An absolute prerequisite. Only after that has gained a certain permanence, can one navigate the wilder waters of Word :)

Tinbendr
12-09-2009, 02:29 PM
Here's Tony's solution in VBA. Runs in about 5 seconds on my Intel Duo 3.33 GHz and Word 2007.
Sub ParseWildcard()
With ActiveDocument.Range.Find
.ClearFormatting
.Text = "^13([A-Z]{2} [A-Z])"
.Replacement.Text = "ChessForZebras\1"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
With ActiveDocument.Range.Find
.ClearFormatting
.Text = "^13"
.Replacement.Text = " "
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
With ActiveDocument.Range.Find
.ClearFormatting
.Text = "([A-Z]{2}) (*) ([0-9]{1,3}.[0-9]) ([! ]@) (*)ChessForZebras"
.Replacement.Text = "\1^t\2^t\3^t\4^t\5^p"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With

End Sub

James Niven
12-09-2009, 06:51 PM
Thanks GeekGirlau for pointing out SaveAS, that got it running.

TonyJollans, I guess I am trying to get my head around what you outlined in your post, now I see the code that Tinbendr provided from your example I can now see what you a outlining...still somewhat fuzzy, but a little clearer.

Tinbendr, you are right ,Tony's code runs in a 5 seconds on my machine as well. I am going to disect your first two sub code example to learn what you have put down and teach myself.

I really appreciate all the help and assistance from y'all, this is a learning curve!!

Thanks

James

TonyJollans
12-10-2009, 03:04 AM
A little bit of explanation is, perhaps, in order.

Firstly:

Parentheses act as they do in other circumstances to group parts of an expression together.
{Square) brackets mean 'find any one of the characters inside me'.
The numbers in braces are the number of times to find the preceding (sub-)expression.
The caret ('^') and the backslash ('\') are indicators of special codes to follow. Why there are two different (NON-interchangeable) characters for this, I don't know.
There are other special characters - those used here will be described when they are met


Now, step 1 finds all instances matching this pattern: ^13([A-Z]{2} [A-Z]). This consists of two elements:

^13 The caret indicates a special code, 13, which, when wildcards are in use, means a paragraph mark.
([A-Z]{2} [A-Z]) The parenthesized expression can be further broken down as:

[A-Z]{2} Any two characters (the 2 in braces) in the range A - Z, upper case characters.
(a space) (followed by) a space.
[A-Z] (followed by) another upper case letter.



Altogether this finds two upper case characters (not necessarily a state abbreviation) followed by a space followed by another upper case character. The disambiguation provided by the space and following character worked on the test data but might need tweaking for different data.

Having found the above, all occurrences of it are replaced by ChessForZebras\1. This, again, consists ot two elements:

ChessForZebras An arbitrary string of something not in the file (so that it can be clearly identified later) - my choice is the title of a book I saw a couple of days ago that I guessed wouldn't appear. One doesn't normally have to be so inventive.
\1 This means 'the first parenthesised string in the found text, in this case the two upper case characters, the space, and the next character.


The overall effect is to just replace the paragraph mark, when it has been found in the context described above.

Step two simply replaces all remaining paragraph marks with spaces.

Step three is where it gets more complex. It looks for ([A-Z]{2}) (*) ([0-9]{1,3}.[0-9]) ([! ]@) (*)ChessForZebras Breaking this down, as before, we have:

([A-Z]{2}) Two upper case characters. This will be the (presumed) State abbreviation. The file begins with one, and each successive Find starts from the end of the previous replacement, so, as you will see, this will be where they had previously been identified and prepended with "ChessForZebras"
(a space) The State abbreviation is always followed by a space. This helps in identifying the State abbreviation, and having it explicit means that it can simply be dropped when a tab character is put in its place. This is an expression in its own right; it could be parenthesised but does not need to be as it will not be referenced later.
(*) The asterisk means 'find anything'. Rather non-specific by itself, this just tells Find to take whatever text there is that does not match what comes later.
(a space) A space. The following expression (what is really being searched for) is always preceded by a space, which will be dropped during the replacement.
([0-9]{1,3}.[0-9]) Another expression that can be further broken down:

[0-9]{1,3} One to three (the minimum and maximum in braces) occurrences of characters in the range 0 - 9, numeric digits.
. A decimal point. Explicit. Simple.
[0-9] Another single character in the range 0 - 9, a numeric digit.


Taken together this finds the numbers (radio frequencies or whatever they are) 1.5, 97.6, 102.9, etc.
(a space) Another space, serves to delimit the previous number.
([! ]@) Something else that needs breaking down ...

[! ] The exclamation mark (shriek!, bang!, whatever you want to call it) means 'NOT', anything but the following character, so this means 'not a space'
@ The at sign is another special character that means 'find some more of what you've just found'


Put together this means find a whole string of text, any text, that does not contain any spaces.
(a space) Yet another space, this time delimiting the previous string without any.
(*) Just as before, the asterisk means 'find anything'. Non-specifically, this just means take whatever you find before the next explicit string you are looking for, which is ...
ChessForZebras The string originally inserted before each (presumed) State abbreviation, in place of the paragraph mark that had been there before.

Having found what should be a complete record, it must be put back in the desired format: \1^t\2^t\3^t\4^t\5^p This might look daunting, but is straightforward:

\1 The backslash means a special code follows, in this case, the number one means the first parenthesised expression found, the State abbreviation
^t Remember the caret also means a special code follows, in this case, ^t means a tab character.
\2 The second parenthesised expression found, the non specific text between the State abbreviation and the number
^t another tab character.
\3 The third parenthesised expression found, the number
^t another tab character.
\4 The fourth parenthesised expression found, the non-blank text string
^t another tab character.
\5 The fifth, and final, parenthesised expression found, the non-specific text string before the final delimiter
^p another special character, p meaning a paragraph mark


Once the above replacement has been done, Find starts all over again looking for the next string to match. The various spaces (which were not parenthesised during the Find), and the "ChessForZebras" string, have gone - they were not included in the replacement pattern.

That turned out a little longer than I anticipated. I hope I haven't made too many typos, and it helps.

James Niven
12-10-2009, 04:44 AM
Hi TonyJollans,

Wow, what a lovely detailed explanation outlining what you have done, I now have a better understanding of your solution and you have it in plain english. Its very clear Tinbendr has a clear understanding of your example due to the code he put forth!!

I am happy I have an answer to this, thanks to all who replied.

Cheers

James

Tinbendr
12-10-2009, 07:11 AM
Wow! Tony! Great description!

For those wanting to read more, check this link (http://word.mvps.org/faqs/general/UsingWildcards.htm) out.

fumei
12-10-2009, 11:31 AM
Thank you Tony for posting a clear detailed walk through. I think quite a few people will benefit from that. I am copying it out to keep as a separate file for reference.