PDA

View Full Version : Solved: Tricky: Check column values using formulae only



xluser2007
11-24-2008, 04:38 AM
Hi All,

I have the following simple table (as attached in the workbook).

It has 3 columns (and 3 [variables]) only A,B,C.

Column A: [State] should take values 1 and 2 only.
Column B: [FieldB] should take values 7947 to 8040 (incremented by 1), for each [State]
Column C: [FieldC] should take values 0 to 130 (incremented by 1), for each [State] and [FieldB]

As it can be seen, each column to the right cycles through variables in a very simple manner relative to the column to the left i.e. [FieldB] depends on [State] and [FieldC] depends on [State] and [FieldB].

Although I have discussed values what each variable (column) should take, could anyone show how to build a formula driven check to ensure that these are the values that are actually in the table provided. That is, for the table attached, that all variables follow the rules above and that there are no discrepnacies (missing rows etc).

I would be extremely grateful to see how such a simple (to the naked eye) check could be logically and rigorously constructed without vba, using straight array formulae or otherwise. As the rest of the spreadhseet is blank, feel free to use as many helper columns as possible.

thanks and many regards for any help.

xluser2007
11-24-2008, 04:40 AM
Workbook as per post #1:

Bob Phillips
11-24-2008, 07:03 AM
Try this

=IF(OR(A2<>A1,B2<>B1),IF(C2=0,"Ok","Should be restart"),IF(OR(C2=0,C2=C1+1),IF(C2>130,"Too big","Ok"),"Out of sequence"))

xluser2007
11-24-2008, 04:13 PM
Hi Bob,

The simplicity and accuracy of your responses never ceases to impress me.

Just to check that I have implemented correctly, please find attached the workbook.

In D2, the formula seems to give "Ok" (which is correct) when it compares the text values in row 1 and number values in row 2. Could you please explain how the formula checks this part.

I have put through summary checks below in D24631:D24633, could you please clarify that this is the way your formulae were intended to provide the final check.

Also, one other thing, if I put your formulae in column D and delete any row, I get #REF! errors, is there any way to control for this deletion and show that an error has occurred (i.e. show that there are missing entries and the process should be restarted)?

Thanks for all your help, I really appreciate it :).

Bob Phillips
11-25-2008, 02:08 AM
Hi Bob,

The simplicity and accuracy of your responses never ceases to impress me.

I am sure that the formula could be reduced but I think that it may be more obtuse rather than simpler, so I left it.


Just to check that I have implemented correctly, please find attached the workbook.

In D2, the formula seems to give "Ok" (which is correct) when it compares the text values in row 1 and number values in row 2. Could you please explain how the formula checks this part.

It starts by checking if either of the values in column A or column B is different to the previous row, if so then FieldC should be a value 0. The formula checks the A and B values, if not the same as previous, it then checks the C value for 0, if so the Ok message else Restart message.


I have put through summary checks below in D24631:D24633, could you please clarify that this is the way your formulae were intended to provide the final check.

If I understand you correctly, that was how I tested it.


Also, one other thing, if I put your formulae in column D and delete any row, I get #REF! errors, is there any way to control for this deletion and show that an error has occurred (i.e. show that there are missing entries and the process should be restarted)?

You could offset the cells that #REF out, like so

=IF(OR(A2<>OFFSET(A2,-1,0),B2<>OFFSET(B2,-1,0)),IF(C2=0,"Ok","Should be restart"),IF(OR(C2=0,C2=OFFSET(C2,-1,0)+1),IF(C2>130,"Too big","Ok"),"Out of sequence"))

but this does make the formula volatile and more complex.

xluser2007
11-25-2008, 02:41 AM
I am sure that the formula could be reduced but I think that it may be more obtuse rather than simpler, so I left it.



It starts by checking if either of the values in column A or column B is different to the previous row, if so then FieldC should be a value 0. The formula checks the A and B values, if not the same as previous, it then checks the C value for 0, if so the Ok message else Restart message.



If I understand you correctly, that was how I tested it.



You could offset the cells that #REF out, like so

=IF(OR(A2<>OFFSET(A2,-1,0),B2<>OFFSET(B2,-1,0)),IF(C2=0,"Ok","Should be restart"),IF(OR(C2=0,C2=OFFSET(C2,-1,0)+1),IF(C2>130,"Too big","Ok"),"Out of sequence"))

but this does make the formula volatile and more complex.
Bob, this is wonderful stuff, really is. Thank you for your explanations and insight.

Amusing though, a colleague made a similar check of this task by creating mountain of VLOOKUPs and a tremendous amount of replications of the original tables in various formats. The check-workbooks ended up being 50MB each!

Your solution has made a the same check in just one helper column and the workbook is only 2.5 MBs!

Although my manager will probably take my colleagues' solution as she spent a lot of time developing it (and I do respect both of them), it is funny how people don't want to ask around or consider simplifying the approach before forming a solution. Anyways, I got to learn some cool techniques from you out of it which was great.

Just thought you may have found this interesting.

Aside: Bob, just to add one more extension to the problem, last one for this case http://www.vbaexpress.com/forum/images/smilies/001.gif (this is just for my knowledge).

I have now added a column D. All that needs to be checked now is that if [FieldB] takes values 8035-8040, [fieldC] should cycle from 0-130 (which your formula already does), and [fieldD] should simply take the values 130 rows above it.

That is for [fieldC ranging from 0-130 for [field] B being 8035-8040, [fieldD], is simply a copy of the [fieldD] values when fieldB equals 8034. Basically the 8034 values copied down 6 times.

I know it sounds complicated, but I have highlighted the relevant portions to check for [State1] in the attached workbook.

If you could please help with this, I would appreciate it.

Bob Phillips
11-25-2008, 03:46 AM
Amusing though, a colleague made a similar check of this task by creating mountain of VLOOKUPs and a tremendous amount of replications of the original tables in various formats. The check-workbooks ended up being 50MB each!

Your solution has made a the same check in just one helper column and the workbook is only 2.5 MBs!

Although my manager will probably take my colleagues' solution as she spent a lot of time developing it (and I do respect both of them), it is funny how people don't want to ask around or consider simplifying the approach before forming a solution. Anyways, I got to learn some cool techniques from you out of it which was great.

The solution should not be dependent upon the amount of time spent on it, it should depend upon the quality of the results, and the quality of the solution. In my many years of experience, most computer solutions have more time spent upon maintaining them than in developing them, so it is important than maintenance is given the correct priority. If you have a 50Mb workbook with mountains of tables and VLOOKUPS, it is going to be harder to maintain.


Aside: Bob, just to add one more extension to the problem, last one for this case http://www.vbaexpress.com/forum/images/smilies/001.gif (this is just for my knowledge).

I have now added a column D. All that needs to be checked now is that if [FieldB] takes values 8035-8040, [fieldC] should cycle from 0-130 (which your formula already does), and [fieldD] should simply take the values 130 rows above it.

That is for [fieldC ranging from 0-130 for [field] B being 8035-8040, [fieldD], is simply a copy of the [fieldD] values when fieldB equals 8034. Basically the 8034 values copied down 6 times.

I know it sounds complicated, but I have highlighted the relevant portions to check for [State1] in the attached workbook.

If you could please help with this, I would appreciate it.

I am not sure I am fully getting this. Are you saying that you want to add an extra test, and that this test should check that if the [field] B value is 8035-8040, that the values in [field] D are just replications of the values in [field] D for [field] B value 8034. Does there have to a [field] B value for each of 8035-8040? Does every [field] D have to be present, and in order?

Can I ask what is the purpose of this data and this check, just wondering if there is an alternative approach.

xluser2007
11-25-2008, 04:19 AM
The solution should not be dependent upon the amount of time spent on it, it should depend upon the quality of the results, and the quality of the solution. In my many years of experience, most computer solutions have more time spent upon maintaining them than in developing them, so it is important than maintenance is given the correct priority. If you have a 50Mb workbook with mountains of tables and VLOOKUPS, it is going to be harder to maintain.


I completely agree. I thought I'd share this with you, in particular because through VBAX and MrExcel.com I am trying to learn new techniques and find it hard, despite being pro-active, to implement innovative techniques over age-old processes that are always lurking around in any company.




I am not sure I am fully getting this. Are you saying that you want to add an extra test,

Yes, I would like this to be an extra test.



and that this test should check that if the [field] B value is 8035-8040, that the values in [field] D are just replications of the values in [field] D for [field] B value 8034.


Exactly right Bob, this is the test. that i require to be added to the existing one.

Basically check that for 8035-8040, that the [fieldD] values equal the same values offset by 131 upwards, so that 8034 values carry down to 8035, which are the same as 8036 etc.


Does there have to a [field] B value for each of 8035-8040?

Yes, there has to be a [fieldB] value for each of the 8035-8040. Your current formula checks this really well at the moment (based on my testing).


Does every [field] D have to be present, and in order?

Yes this is correct, in the sense that for 8034, we have a certain order for the [fieldD] values. As 8035-8040 are just copies of the 8034 values, for each of teh corresponding fields, it should just be an ordered copy of the 8034 values, as you pointed out.



Can I ask what is the purpose of this data and this check, just wondering if there is an alternative approach.


Sure, sorry to have been vague in this front, for brevity, I didn't want the practical purpose to cloud the Excel req's.

The gist is these are assumptions that feed into a model we are using (run using SAS software which takes in these CSV's).

Basically these assumptions were run up to values of 8034 for [fieldB], a time variable, and were taken from an existing model.

We needed to extend these CSV's with initial dummy data (for forecasting purposes, from 8035-8040 for [fieldB]), which we did using a macro. The model will take this dummy data and based on the relationships will project actual values into these fields when it is run.

As we copied these assumptions from an existing model and made very ordered and logical extensions to the CSVs (which VBAX member Krishna Kumar kindly helped me with, here (http://www.vbaexpress.com/forum/showthread.php?t=23179)), we should have an independent method of checking that the macro did the right job in extending and adjusting the CSV's.

The reason why I asked for formulae solution is to keep the checks consise (memory and links-wise) and to ensure that credible data is being processed.

Though I am open to other suggestions, aside from seeing a full formula approach, I was unsure how to "check" using VBA.

Please let me know if this helps to clarify.

Bob Phillips
11-25-2008, 04:48 AM
I would add another formula altogether in a separate column, not complicate the original formula any further.

=IF(AND(B2>8034,B2<=8040),IF(D2=OFFSET(D2,-131,0),"Ok","Invalid claims sequence"),"")

Bob Phillips
11-25-2008, 04:57 AM
What is your id over at MrExcel?

xluser2007
11-25-2008, 05:12 AM
I would add another formula altogether in a separate column, not complicate the original formula any further.

=IF(AND(B2>8034,B2<=8040),IF(D2=OFFSET(D2,-131,0),"Ok","Invalid claims sequence"),"")

Great suggestion.

I've just put it through and collated a similar check down the bottom of the column. works really well.

Thanks Bob, I'll mark it Solved, but will post back if I have any more queries.

regards

xluser2007
11-25-2008, 05:13 AM
What is your id over at MrExcel?

exceluser2007, very original huh :yes ?

Bob Phillips
11-25-2008, 05:26 AM
Do you use Excel 2007 or 2003?

xluser2007
11-25-2008, 03:15 PM
Do you use Excel 2007 or 2003?
I use Excel 2003.

I joined VBAX and MrExcel late 2007, hence the name :-).

Bob Phillips
11-25-2008, 04:05 PM
I looked you up on MrExcel, and saw that we only shared one thread, one where I gave a conditional formatting solution, and you clarified it for the OP.

How do you decide whether to post here or there?

xluser2007
11-25-2008, 07:47 PM
I looked you up on MrExcel, and saw that we only shared one thread, one where I gave a conditional formatting solution, and you clarified it for the OP.

How do you decide whether to post here or there?
Yeah, as you maintain the same avatar, it's a lot easier to find you.

To be honest, I don't really have a set rule for posting here or there. Both have great experts who are willing to share.

I scan both websites almost daily.

I find though, if it is a deeper VBA problem that requires an examination of a workbook(s) before a solution is formed, it lies more in the philosophy of VBAX ("Bringing VBA to the world") and I post here. Though some people have given me great help on VBA over at mrExcel.com as well.

For formulae related problems, both sites are superb as well, I find though that MrExcel.com has experts who have a particular passion for them (barry houdini, Aladin Akyurek and many others), and I tend to post a lot of my formulae clarifications there, though not restricted to there alone (as with this thread :)).

How do you decide where to post Bob?

Bob Phillips
11-26-2008, 01:23 AM
That's easy. I post everywhere, here, MrExcel, ExcelTip, OzGrid, JMT, UtterAccess and the public newsgroups, and I do use different avatars in some.

My main forum is here, I like VBAX because it is not competitive like many of the others, and the questions are often more interesting and need developing with the poster. MrExcel, and ExcelTip, tend to have far more questions that are very basic and you can just rattle off, so I tend to answer questions there that are a bit less obvious, or where I feel I can add to the answser (or where I can take issue with something someone said :-)).

OzGrid is dying so I don't go there much anymore, and JMT and UtterAccess is low volume, so I just pop in rarely.