PDA

View Full Version : Solved: Useless Data Because of Abbreviations?



tyndale2045
10-20-2008, 08:52 AM
I need some suggestions for handling information (poorly) compiled at a remote location in our company.

We have about two hundred files that each contain the slightly abbreviated names of (roughly) twenty farmers in column B. All of the farmers have already been assigned a unique ?Grower ID? number that should have been inserted into column A to help keep this information accessible. It was not. Thus, I would like to write a macro that could go through each of these files and insert this Grower ID number just to the left of these abbreviated names (thus, they would be placed in column A).

I already have all of these ID numbers handy in another workbook (listed right next to the farmer?s FULL name). In other words, I?m in a position to create a range that can be referred to by a VLOOKUP function (or something similar) to retrieve the corresponding ID number for each farmer (or ?grower? as the sheet calls it).

Problem: since the names on the first sheet are abbreviated, a VLOOKUP function is not going to recognize them.

Any ideas?

PS - I?m attaching an example of one of these files, with a miniature version of the workbook that lists the farmer?s full names and corresponding ID#?s.

Demosthine
10-20-2008, 09:59 AM
Good Morning.

With inconsistent data like that, I'm afraid it is going to be impossible to get something that will be perfect, but the formula below comes darn near close to it. Based on your example Workbook, it successfully matched 27 out of 28 entries. That's pretty darn good, if you ask me.

=OFFSET('Grower ID'!$A$6,MATCH("*"&SUBSTITUTE(B6," ","*")&"*",'Grower ID'!$A$7:$A$46,0),1)

Just paste the above formula into Cell B6 and fill down.

Scott

Bob Phillips
10-20-2008, 12:37 PM
Why would you use OFFSET instead of the non-volatile, more efficient INDEX

=INDEX('Grower ID'!$B$7:$B$46,MATCH("*"&SUBSTITUTE(B6," ","*")&"*",'Grower ID'!$A$7:$A$46,0))

Demosthine
10-20-2008, 09:51 PM
Why would you use OFFSET instead of the non-vaolatile, more efficient OFFSET

=INDEX('Grower ID'!$B$7:$B$46,MATCH("*"&SUBSTITUTE(B6," ","*")&"*",'Grower ID'!$A$7:$A$46,0))

Evening Bob.

I'm assuming you meant the more efficient Index, correct? As a very large portion of my knowledge is self-taught (or through poorly designed help files), I'm not sure why Index would be better than Offset. What makes Index less volatile or more efficient?

Both Index and Offset appear to provide the same result, right down to returning a #N/A error if the Match is not found. From a Range standpoint, Index adds an additional range you have to specify, 'Grower ID'!B$7:$B$46, whereas Offset uses a single-cell reference point.

Thanks.
Scott

Bob Phillips
10-21-2008, 12:17 AM
It is not that INDEX is less volatile than OFFSET, it is that OFFSET is volatile and INDEX is not volatile. When any calculation on a spreadsheet is performed, volatile functions (such as TODAY(), NOW(), and of course OFFSET) recalculate every time, regardless of whether there are in the precedent chain that Excel's smart recalculation engine uses or not. Hence they are inefficient, especially when used many times. Clearly there is no substitute for TODAY() or NOW(), but OFFSET can often be replaced by non-volatile functions.

Even though OFFSET might look more efficient as it uses a single cell reference to INDEX's range, this is what makes it slower, there is more work required to work out what range is being processed than in INDEX. That function in that spreadsheet was approx 9% slower (when directly calculated, add in all of the unnecessary recalculations, and the figure could be anything from 9% up).

If they hadn't produced the same result, the point would immaterial, one would be correct, one would not. And of course, they both return #N/A if there is no item found as they both use MATCH to do the find, and any function that processes the MATCH will choke if it is in error (except functions like ISNA, ISNUMBER and so on).

Spreadsheets should be designed for efficiency just as code should, and functions like INDIRECT should be avoided, functions such as OFFSET should be avoided where possible; the spreadsheet may start lean and tight, they have a habit of growing flabby, just like us all.

tyndale2045
10-21-2008, 04:58 AM
Demosthine,

Thanks a million for that formula. It would have taken me a week to make something that worked that good. I enormously appreciate your time.

Jim

tyndale2045
10-21-2008, 05:03 AM
XLD,

I couldn't get that formula to work. I placed it, as it appears, into cell A6 (of the example file) and I just get a #N/A error.

If I knew what I was doing, I could probably see why it is not working. But, alas, it's pretty clear I don't know what I'm doing.

Let me know if I'm just being stupid and overlooking something.

Also, thank you for your time. I am very grateful for your assistance.

Jim

Bob Phillips
10-21-2008, 05:22 AM
Well if you are being stupid I don't see how. I just pasted the exact formula into A6, and copied it down, and got exactly the same results as the OFFSET formula.

Post the workbook amended with the formula included.

tyndale2045
10-21-2008, 05:55 AM
Sure. It's attached.

Bob Phillips
10-21-2008, 06:19 AM
You haven't adjusted the formula to the real data. It is still pointing at the last row of 46, whereas the real data is 125.

tyndale2045
10-21-2008, 06:48 AM
Yup, I just tried it. Works great. Thanks for your time and patience.

Jim

tyndale2045
10-21-2008, 07:22 AM
Okay, now I've created a VBA routine that inserts this formula into those files that I mentioned in my first post. And it works great (thanks again). However, I feel certain that there's gotta be a better way to put that formula into VBA besides the way I am doing it.

I am seeking to get a formula like this in each cell

=INDEX('Grower ID'!$B$7:$B$186,MATCH("*"& SUBSTITUTE($D$4," ","*")&"*",'Grower ID'!$A$7:$A$186,0))

Here is how my tortured code looks.



For Each cell In rngGrowerNameList
If cell <> "" Then
cell.Offset(0, -1).Formula = "=INDEX('Grower ID'! $B$7:$B$186,MATCH(" & Chr(34) & "*" & Chr(34) & _
"& SUBSTITUTE(" & cell.Address & "," & Chr(34) & " " & Chr(34) & "," & Chr(34) & "*" & Chr(34) & ")&" & Chr(34) & _
"*" & Chr(34) & ",'Grower ID'!$A$7:$A$186,0))"
End If
Next cell



Again. It works, but it sure ain't pretty.

I never have figured out the best way to convert the quotation marks in a cell formula into a formula for VBA.

Am I right in thinking there must be a better way?

Thanks,
Jim

Bob Phillips
10-21-2008, 07:50 AM
I would set rngGrowerNameList to the Range A6:A33 and use



rngGrowerNameList.Formula = "=INDEX('Grower ID'!$B:$B,MATCH(""*""" & _
"&SUBSTITUTE(B6,"" "",""*"")&""*"",'Grower ID'!$A:$A,0))"

tyndale2045
10-21-2008, 11:50 AM
XLD,

That is simply beautiful.

I had no clue that you could change the formulas to all the cells in a range so that relative references (B6 in this case) would be adjusted all the way through the range (so that the cell after the first has B7 instead of B6).

I love it.

Thanks again for your help.

Jim

mdmackillop
10-21-2008, 04:21 PM
Hi Jim,
For information, the easiest way to convert a complicated formula for use in VBA is to remove the = sign and record a macro reinstating it.
eg

Sub Macro1()
'
' Macro1 Macro
'
'
ActiveCell.FormulaR1C1 = _
"=INDEX('Grower ID'!R7C2:R186C2,MATCH(""*""& SUBSTITUTE(R4C4,"" "",""*"")&""*"",'Grower ID'!R7C1:R186C1,0))"
Range("G6").Select
End Sub


This gives you all the double/triple quotes etc. and you can then adjust it to suit.

Regards
MD

tyndale2045
10-22-2008, 06:51 AM
MD,

Hey thanks! That will really help me in the future. I have always struggled with those pesky quotes (hence, all the instances of Chr(34) in my code).

Much appreciated.