PDA

View Full Version : check if a surname is repeated within a cell and output a simpler name



pwl2706
05-02-2013, 01:59 AM
Hi,

I have a list of several thousand items, which consist of several different names together like this:

Mr P Thompson & Mrs S Thompson & Mr A Thompson
Mr C Guy-Johnson & Mrs A Guye-Johnson & Miss J Guye-Johnson
Mrs Fuller & Ms D Fuller & Dr K U Fuller
Dr V Patel & Dr OO Patel
Mr B Burden & Mr MP Wood & Ms C Pollock
Mr PW Philips & Mrs PW Philips
Dr D Watson & S Holmes
Mr R Polanski & Mrs S Polanski
Mr S Spielberg & Miss G Spielberg & Mrs T Spielberg

Sometimes the surname is repeated within the cell, sometimes it is not.

I want to build a formula that will determine if the surname is repeated, and return a string where the Salutations/titles and inititals are concatenated with the Surname at the end, unless the surnames are different.

For example,

- Mr S Spielberg & Miss G Spielberg & Mrs T Spielberg
- Mr R Polanski & Mrs S Polanski

would become,

- Mr S & Miss G & Mrs T Spielberg
- Mr R & Mrs S Polanski

BUT:

- Mr B Burden & Mr MP Wood & Ms C Pollock
- Dr D Watson & S Holmes

would remain the same as the surnames are different

Is it possible to do that with formulas, (and not splitting the names using Text to Columns), and how would I do that please?

thanks
Philip

mdmackillop
05-02-2013, 02:37 AM
You could create a User Defined Formula (UDF), or this should work for the last name in the list (thanks to Richard Scholar)
=SUBSTITUTE(A1,TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",50)),50))," ") & TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",50)),50))

snb
05-02-2013, 04:52 AM
or an array formula: (ctrl-shft-Enter)

=SUBSTITUTE(A1;RIGHT(A1;LEN(A1)-MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100)));"")&RIGHT(A1;LEN(A1)-MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100)))

pwl2706
05-02-2013, 05:54 AM
or an array formula: (ctrl-shft-Enter)

=SUBSTITUTE(A1;RIGHT(A1;LEN(A1)-MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100)));"")&RIGHT(A1;LEN(A1)-MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100)))

this is great, may I ask how it works?

snb
05-02-2013, 07:37 AM
somewhat simpler:

=SUBSTITUTE(A1;MID(A1;MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100));100);"")&MID(A1;MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100));100)

You can see, the formula consists of 2 parts:

=SUBSTITUTE(A1;MID(A1;MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100));100);"")

&MID(A1;MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100));100)

with
MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100))
we get the positon of the last space in the string in A1

with
MID(A1;MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100));100)
we get the last word (after the last space in the string in A1)

with
=SUBSTITUTE(A1;MID(A1;MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100));100);"")
we remove that last word from the whole string in A1

to the resulting string we add the last word in string A1
&MID(A1;MAX((MID(A1;ROW(1:100);1)=" ")*ROW(1:100));100)

Since this formula will be treated as an array formula, every character in the string in A1 can be evaluated using
MID(A1;ROW(1:100);1)

pwl2706
05-03-2013, 08:53 AM
...(thanks to Richard Scholar)
=SUBSTITUTE(A1,TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",50)),50))," ") & TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",50)),50))

May I ask how that works please - I understand SUBSTITUTE, but not the part REPT plays...

mdmackillop
05-03-2013, 11:33 AM
To isolate the last word, all spaces are replaced by 50 spaces. The rightmost 50 characters will include the last word only. Trim removes the padding to leave only the text of the last word. If the final "word" comprised more than 50 characters then 50 would need to be increased accordingly.

pwl2706
05-07-2013, 02:06 AM
You could create a User Defined Formula (UDF), or this should work for the last name in the list (thanks to Richard Scholar)
=SUBSTITUTE(A1,TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",50)),50))," ") & TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",50)),50))

Can you tell me, how would I modify that to handle a name like this:


Mr I Stavroulis & Miss S Stavrouli

as because it is replacing only the last word, which is included in the first surname, a trailing s is left over so I end up with:


Mr I s & Miss S Stavrouli

after replacing Stavroulis with Stavrouli

pwl2706
05-09-2013, 07:27 AM
after some tinkering I came up with this to handle the above issue of similar names (one being male, the other being female)



=SUBSTITUTE(W:W,TRIM(RIGHT(SUBSTITUTE(TRIM(W:W)," ",REPT(" ",100)),100)) & " ","")


so this forces a replacement only if the names are identical!