PDA

View Full Version : [SOLVED] Filling missing data from a table based on criteria



changeable
03-22-2005, 11:32 PM
Hi,I am a beginner in excel VBA and this job is really hard for me. I wish i can get help from those have lots of experience in VBA. i have attached the picture file as well as 2 excel file for my problem here. Please help me, it is really urgent. Thank you all!
http://img.photobucket.com/albums/v232/skyearth/Excel/replace.jpg

changeable
03-25-2005, 07:28 AM
can anyone help me on this problem?please? the replaced value should be highlighted too. is it because my question is not clear?

the attachment is the related file which i have mentioned.

changeable
03-28-2005, 07:26 AM
still cannot be done? can somebody show me some of the way? Thanks again

MWE
03-28-2005, 06:42 PM
This does not look too difficult. Let me look at the files you have posted and I will get back to you tomorrow.

MWE
03-28-2005, 09:44 PM
I have a possible solution for you.

I copied your original example data into a spreadsheet called SegReplace.xls wrote some code and did some testing. Your original historical_e.xls did not have sufficient data to complete the testing, so I used the Book1.xls data. I made a copy called HistoricalDB.xls and tweaked a few entries to ensure that all three cases you mentioned were tested. HistoricalDB.xls is ~ 2x the size of your original because I made a copy of the original "Historical" sheet. I also added another test line in SegReplace.xls to test the 3rd case, i.e., NA in Historical and not enough data around the hole to use the 3pt moving average. My solution moves down the rows and across the cols. Therefore, the only time you will likely need the "last resort" is when you are trying to fill a hold in the first 1, 2 or 3 cols and the HistoricalDB is NA

In the attached zip file are SegReplace.xls and HistoricalDB.xls. Put them in the same directory, open SegReplace.xls and click on the Yellow button.

A few odds and ends:



the yellow button's click routine actually calls a procedure called SearchAndReplace in code Module1.
In the code module called MWE_Procs are a few functions used in SearchAndReplace. SearchAndReplace will not work without those routines
You will notice that the value found for each of the 3 solution methods has a different color. You can change these colors by changing the values of ColorHistDB, Color3Pt and ColorLE in the SearchAndReplace macro. Present value are:


ColorHistDb = 3
Color3Pt = 5
ColorLE = 7


the logic flow is documented reasonably well. I used descriptive names for the variables
the code is "brute force" and could be made more elgant and more efficient, but that might also make it more difficult for you to understand.
the variable named MsgBoxFlag is presently set to 0. Set it to 1 if you wish the macro to spit out status as it finds and fills holes.

I hope that this solves a large portion of your problem. If this works, please toggle this thread to solved. If not, let me know.

I suspect that you received no earlier replies because this problem is more complicated than a few lines (even a few dozen lines) of VBA code. It is not "hard", just time consuming. You owe the forum 2 dozen donuts and a gallon of Starbuck's best.

changeable
03-30-2005, 06:39 AM
Thank you, MWE! This is really a detail explanation which i have ever receive from online help and it works great! However an error which you have mentioned (when trying to fill a hole in the first 1, 2 or 3 cols and the HistoricalDB is NA)
occured when i tried it in the another set of data (located in worksheet inout2 of historicalDB.xls). what if i suggest that if that kind of error are detected, that row of data will be highlighted with special color and the algorithm will skip that row and run on next row? This is really a time consuming job, i owe the forums a lot. Once again, thank you!

MWE
03-30-2005, 11:44 AM
let me look at the other set of data, reproduce the problem and get back to you.

MWE
03-30-2005, 08:45 PM
Using ?InOut2? has revealed several logic problem in the SearchAndReplace macro:

I assumed that there were two spreadsheets, the one with the holes and HistoricalDB. If the sheet with holes is a part of HistroicalDB, then the problem is easier. I recommend that you keep the two separate.

I assumed the sheet with the ?holes? you provided as an example was a valid example with respect to key rows and columns. That is not true and the differences are non-trivial. Changes required to accommodate InOut2:

In the original example, Seg#s were in row 1. In InOut2, they are in row 2. The macro must know where to find seg numbers. Change: SearchAndReplace now examines the first few rows and figures out which one has Seg#s. It assumes that the character string ?Segment? (not case sensitive) occurs in the 1st col for all data sets. The macro continues to assume that the Seg# row in HistoricalDB is 1

In the original example, the Seg#s in the Seg# row did not have ?SegXXX?; rather they just had the number, e.g., 4. In HistoricalDB, the Col headings are SegXXX. I originally assumed that the true dataset with holes and HistoricalDB would have the same number of cols and Seg#s would align. That was a bad assumption. Change: When a hole is found, HistoricalDB is examined for the correct row and correct col.

If either the TimeID row or Seg# col can not be matched, the user is told and can choose to bypass the current (sheet with holes) row. If bypassed the entire row is hi-lited in yellow. The color is programmable, currently NoMatchColor = 36

In the original version of the macro, the 3pt search recognized that the col# had to be >= 4, otherwise there were not enough pts to the left. But the col# really has to be >=5 because the first col is not a value at all. Change: test is now for col >=5 FYI, this change eliminated the failure you encountered.

When HistoricalDB?s value is NA and the 3pt average does not work, the macro resorts to the ?last effort?, i.e., it searches to the left to find a valid value and then to the right. Your original test suggested that there had to be values on both sides. However, since your 3pt approach only looked at values on one side, the macro?s approach only fails if it cannot find any values to the left or to the right. As I examined the logic, it dawned on me that there is a flaw in either scheme. Since the filling is top to bottom, left to right, a search may have to go several places to the right to find a number initially, but if that same search were done later, some of the holes to the right may have been filled in. Thus the ?last effort? should be done only after a complete pass has done on the entire sheet. So there should be two passes: the first fills holes with values from HistoricalDB or 3pt average. The 2nd pass fills any remaining holes with the final approach. With this approach, all holes will be filled except for a row with nothing but holes. The attached version initially displays the # holes to be filled and if any are left after the first pass, displays the number to be filled on the 2nd pass.

The attached zip file contains v3 of the solution. It seems to work fine with InOut2.

I did a little substructuring of the code to eliminate duplicate stuff, e.g., the code to actually fill a hole once a new value is determined. As mentioned before, the code is brute force. Things you might want to do:



add comments that make sense to you so things will be easier to maintain
don't search for the TimeID row in HistoricalDB for every hole in the active sheet. Only search for the right row for each new row in the active sheet
add comments that make sense to you so things will be easier to maintain
add check code to see if the "Historical" sheet is in the active workbook and adjust code accordingly. All you really need to do is test for the existence of the sheet name "Historical" and if found, then do not open HistoricalDB.xls and simply set HistxlBook to the activeworkbook.
add comments that make sense to you so things will be easier to maintain
consider alternative algorithms for the 3pt average and "last effort". An averaging approach that consider values on both side of the hole is intuitively more appealing.
add comments that make sense to you so things will be easier to maintain (are you getting the message?)
If this works and is complete, toggle the thread to Solved. Otherwise post back with issues.

Finally, you now owe the forum 4 dozen donuts, 2 gallons of Starbuck's best and 4 tickets to the finals in St Louis (send them directly to me)

changeable
04-01-2005, 06:50 AM
Hi! Thank you! I dont think i can find other VBA forum which the reply is as detail and as comprehensive then this forum. Thanks so much for the time and effort in helping me solving the problems! Yes, the problem is solved! Thanks everyone, especially MWE! The suggestion
"consider alternative algorithms for the 3pt average and "last effort". An averaging approach that consider values on both side of the hole is intuitively more appealing." are indeed suitable for my case too! Would you mind to teach me how to apply this suggestion? Thanks

MWE
04-01-2005, 08:41 AM
Hi! Thank you! I dont think i can find other VBA forum which the reply is as detail and as comprehensive then this forum. Thanks so much for the time and effort in helping me solving the problems! Yes, the problem is solved! Thanks everyone, especially MWE! The suggestion
"consider alternative algorithms for the 3pt average and "last effort". An averaging approach that consider values on both side of the hole is intuitively more appealing." are indeed suitable for my case too! Would you mind to teach me how to apply this suggestion? Thanks
I am glad that this is working for you. You are getting more detail than average becasue I suspect that your Excel and VBA skills are not fully developed.

An averaging approach should include as many points as are relevant. Except for special cases where either the "points before" or the "points" after are uniquely relevant, I prefer methods where points on both sides are considered and WEIGTED. By that I mean that the points immediately adjacent to the target point are typically "more important" than points further away.
In simple 3 point averaging, the center point is replaced with the average of it and its neighbors. X(i) = (X(i-1)+X(i)+X(i+1)/3.
In unbiased-weighted 3 point averaging, the center point would get a weight of 1 and its neighbors would get a weight of, say, 0.50.
X(i) = (0.5*X(i-1)+X(i)+0.5*X(i+1)/2
In biased-weighted 3 point averaging, the center point would get a weight of 1 and its neighbors would get a weight biased towards which side is more important. If, say, previous points are twice as important as future points, the forumula would be X(i) = (0.667*X(i-1)+X(i)+0.333*X(i+1)/2
If you were paying attention, you noticed that simple averaging is a version of unbiased-weighted averaging where all weights = 1. Note in either case, you divide by the sum of the weights. Weighting "profiles" can be anything including profiles that weight points further away more than those closer.

One must be careful that the implementation does not bias results by reusing already averaged values. The easiest solution is to have an original array and a new array, e.g., Z(i) = (0.5*X(i-1)+X(i)+0.5*X(i+1)/2

In your case, you have no "center point". Using the previous 3 values has two limitations:
1- there may not be 3 previous values
2- no consideration is given to future points
Using the next 3 points has similar limitations:
1- there may not be 3 next values
2- no consideration is given to previous points

My recommended replacement for your 3pt scheme:
1- use a weighted averaging scheme with two points on either side
2- the points on either side have weights of 1 and the next points have weights of 0.5
2- if there is only 1 point before OR 1 point after, use 1 from each side (using 2 from the side that has two biases the result in that direction)
3- if there are no points on one side, then accept the bias to the other side and use 2 points on that side

The above approach of using 1 or 2 next points, has an impact on the overall logic for the macro. Remember that the macro processes top to bottom and left to right. That means that some of the points to the right may be holes when the revised averaging scheme is used; but those holes would be filled if the scheme were applied a little later. So, just like the change from v002 to v003, using the above approach would mean that there are 3 passes:
1- try to fill holes from HistoricalDB
2- try to fill remaining holes with new averaging scheme
3- apply "last effort" to remaining holes

The use of the revised averaging scheme provides opportunities for more fine tuning and code improvement, e.g., the code does not really need to make 3 passes on the entire sheet with holes, rather it needs to make 3 successive passes on each row. That thought suggests another fundamental logic question: are the rows in any way related? Is there any value in looking at adjacent rows (or some other rows) in HistoricalDB to generate an even better hole filling algorithm?

Finally, I would use a modified version of the new averaging scheme for the last effort. Properly written, the 2nd and 3rd passes collapse into a single pass.

Let me know what you want to do and I can provide suggestions for specific code or the actual code itself. At this point I recommend the former so you learn more about VBA. I should be teaching you how to fish, not ...

That is another 6 donuts ... :devil: