PDA

View Full Version : Searching for special characters



Aussiebear
05-02-2024, 10:56 PM
This question is for all the excel guru's. The following formula searches for special characters in a string.... or so the guy said. How and why does it work?



=SUMPRODUCT(--ISNUMBER(SEARCH({"!","@","#","$","%","^","&","~*","(",")","_","+","=","{","}","[","]","|","\",":",";","'","""","<",">",",",".","~?","/","~~","`"}, A1)))>0

I can picture geargiboy and P45cal salivating at the thought of getting into this one.

georgiboy
05-03-2024, 12:06 AM
The best way to understand it is to break it down and look at the results we get at each step.

Let's say we shorten the array of special characters just to make is shorter for the explanation, to:
=SUMPRODUCT(--ISNUMBER(SEARCH({"!","@","#"}, A1)))>0

Now lets say that in cell A1 is the string:
Sexy! George#

--------------------------------------------------------------------------------------------------
If you test it with the search function alone:
=SEARCH({"!","@","#"}, A1)

You will get:
5, #VALUE!, 13

This is because the characters in the array are found in position: 5 and 15, the error is due to the middle character of the array not being found.
You get an array of the results of the SEARCH function for each of the items in the array.
! is found in position 5, @ is not found, # is found in position 13


--------------------------------------------------------------------------------------------------
If you test with SEARCH and ISNUMBER:
=ISNUMBER(SEARCH({"!","@","#"}, A1))

You will get:
TRUE, FALSE, TRUE

This is because the ISNUMBER function will return TRUE, FALSE depending on whether the value is a number or not, so if you take the result (above) from the SEARCH alone, the first and last values are numbers and the middle value is an error so this is why you get TRUE, FALSE, TRUE.


--------------------------------------------------------------------------------------------------
If you test with SEARCH and ISNUMBER and the double negative:
=--ISNUMBER(SEARCH({"!","@","#"}, A1))

You will get:
1,0,1

The double negative operator (--), when applied to a logical value (TRUE or FALSE), converts TRUE to 1 and FALSE to 0. So, it effectively converts the array of TRUE/FALSE values returned by the ISNUMBER function to an array of 1s and 0s. It also converts numbers stored as text back to numerical values.


--------------------------------------------------------------------------------------------------
Adding the SUMPRODUCT or SUM:
=SUMPRODUCT(--ISNUMBER(SEARCH({"!","@","#"}, A1)))

You will get:
2

this is because quite simply: the sum of 1,0,1 is 2.


--------------------------------------------------------------------------------------------------
Adding in the >0 makes it a comparison to zero, this means if the result of the sum is zero then then final formula result is FALSE, whereas if the result of the final formula is greater than zero then the result is TRUE

Aussiebear
05-03-2024, 12:10 AM
Excellent explanation. Now I can be reasonably confident I wont get busted by ASIO for trying to launch rockets.


So.... everything between the first and last curly brackets is simply special characters being defined?

Aflatoon
05-03-2024, 12:33 AM
FWIW, you could just use COUNT instead of SUMPRODUCT(--ISNUMBER(...)) since COUNT ignores errors anyway:

=COUNT(SEARCH({"!","@","#","$","%","^","&","~*","(",")","_","+","=","{","}","[","]","|","",":",";","'","""","<",">",",",".","~?","/","~~","`"}, A1))>0

georgiboy
05-03-2024, 01:31 AM
So.... everything between the first and last curly brackets is simply special characters being defined?

Yes the below:
{"!","@","#","$","%","^","&","~*","(",")","_","+","=","{","}","[","]","|","",":",";","'","""","<",">",",",".","~?","/","~~","`"}

Is simply an array of values, the curly brackets denote an array.

Aussiebear
05-03-2024, 02:16 AM
So why do some characters need a "~" to precede them?

georgiboy
05-03-2024, 02:43 AM
In Excel formulas, the tilde is used as an escape character. It tells Excel to treat the following character as a literal character rather than as a part of a special character sequence.


The tilde is used before certain characters like "*", "?", and "~" itself. This is because these characters have special meanings in the SEARCH function. For example:


"*" is a wildcard that matches any sequence of characters.
"?" is a wildcard that matches any single character.
"~" itself is used as an escape character in SEARCH.

Aflatoon
05-03-2024, 02:52 AM
Given the nature of all the special characters, you could also use FIND instead of SEARCH and then omit the additional tildes.

georgiboy
05-03-2024, 03:09 AM
GFWIW, you could just use COUNT instead of SUMPRODUCT(--ISNUMBER(...)) since COUNT ignores errors anyway

Given the nature of all the special characters, you could also use FIND instead of SEARCH and then omit the additional tildes.

I agree, the original formula is not optimal.