PDA

View Full Version : [SOLVED:] data cleanup from program



scorched03
09-21-2004, 10:48 PM
When data is retrieved from one of the programs, the data comes out in various formats.
example of several columns
John Smith
Alice Morris (Employee ID123)
Doug Martin, Matt Perry, Joey Terry
Frank Tran (Manager of Operations)
Doug Martin & Frank Warner

to format these cells into where names are separated by only commas/colons and non character giberish is removed, that would be ideal.

coworkers suggest importing into a database and then importing back into excel, but is there a way to do this in vba? or would I have to account for every name format and write a script to solve each particular case?

any suggestions would be great. thanks for reading.

TonyJollans
09-22-2004, 01:26 AM
Hi scorched03,

Welcome to VBAX!

I'm not sure I see any real advantage in using a database over Excel. You still have to parse the strings.

If your data come from a program (rather than directly from user input) are you able to say anything about the format? Are they all like the ones you have posted, which can be dealt with fairly easily? Or could they be more free format? And what exactly constitutes non-character gibberish - could it be at the beginning, at the end, or anywhere in the string?

To answer your question, of course you have to account for every format but there's only really one format in what you have posted, which is something like:

NameString <, Namestring <, NameString ... >>

where NameString is:

Name <(Info)> <& Name <(Info)>>

If you can post a little bit more, I or someone else can probably come up with a solution and/or write a quick bit of VBA to deal with it.

alimcpill
09-23-2004, 02:53 AM
Not 100% sure what you mean, Is the data you posted already in spearate columns somehow? If you mean it dumps all the data in the same column, then I've had a similar problem. You can manipulate the data quite nicely using regular expressions. Attached is a function you could use on your worksheet, if that is what you need. You will need to reference the Regular expressions library from VBA, on my computer ist is VBScript Regular Expressions 5.5 or something.



Public Function CleanUp(sInput As String) As String
Dim objRE As RegExp
Dim strReturnValue As String

Set objRE = New RegExp
objRE.Global = True

'remove anything in brackets
objRE.Pattern = "(\(.*\))"
strReturnValue = objRE.Replace(sInput, "")

'replace ampersands with commas
objRE.Pattern = "\&"
strReturnValue = objRE.Replace(strReturnValue, ",")

'replace " ," with ","
objRE.Pattern = "\ \,"
strReturnValue = objRE.Replace(strReturnValue, ",")

'replace repeated spaces with single space
objRE.Pattern = "\ \ *"
strReturnValue = objRE.Replace(strReturnValue, " ")

'remove anything that's not a letter, comma or space
objRE.Pattern = "[^a-zA-Z\ \,]"
strReturnValue = objRE.Replace(strReturnValue, "")

'finally trim any leading or trailing spaces
strReturnValue = Trim$(strReturnValue)

CleanUp = strReturnValue

End Function




If I paste the data you posted into a single column, then next to it put the function above applied to each one, this function gives:

John Smith
Alice Morris
Doug Martin, Matt Perry, Joey Terry
Frank Tran
Doug Martin, Frank Warner


However, I could be barking up completely the wrong tree!!!:p :bug: