PDA

View Full Version : The BIG DATA Challenge - Match cells on BIG DATA Sheet (LOOP take 4 Hrs !)



10shlomi10
01-06-2017, 01:35 AM
Hi !

need some "out of the box " solution...:help

I have excel file with three sheets.

Let's jump for a minute to the end....
meaning , the purpose of this file:
I want to find the gaps between cells location, in which after the running of code, I'll type a number on TextBox and I'll get the results that is equal or higher from that value i typed.


1st Sheet:
Has two columns of criteria contain numbers.
on each row I'm writing a letters (From column C onward) that specify which columns i want excel to write the address cell into that column.
(1st Picture)

17948

The 2nd sheet:
Is the checking sheet in which I check if criteria is match.
(2nd picture)
17949

The 3rd Sheet:
Is for calculate the gaps between each finding.
(3rd picture)
17950

I wrote a code (on attach file), which is doing a loop to compare the criteria.
It works fine BUT,
as I'm dealing with huge amounts of cells, it can be even to the end of the sheet rows (over Million rows) , this LOOP takes 4 hours to finish. something I could not live with.

I'm trying to find a much faster way to do it.:think:

I thought using loop in arrays, but could not figure out how to Match each Item in Array1 to the item in Array2 and than write into the right cell the cell address, if criteria is match.... ?

As it is possible that matches will go beyond sheets rows, I made some check on the loop and if LastRow is 1,048,576, the code will open a new sheet, copy the last finding to the the first row (of the new created sheet) , and continue the looping.
after than I'll have to figure out how to bring the Filter numbers I want to. same as I'm doing on the attached file.
(is it possible to filter from many sheets with one Text box ? >> couldn't find how to do it....:bug:)

The file I made is on attach

By the way, if you have any other efficient way to do this task, I'll be more then happy to read about it.
(I even thought of some other way.... is there is a way to move the VBA looping results, into Power Query module, as Power Query module has no Rows limit !! :*))

Any help will be appreciated !

Thanks a lot !!!

17951

Shlomi

Bob Phillips
01-06-2017, 02:47 AM
I am sure you could do it with Power Query and Power Pivot, but I am still not clear on what you are trying to do.

Rather than asking us to reverse engineer your code, tell us what the business process is, what data do you have, what do you supply, what do you want to know.

10shlomi10
01-06-2017, 03:54 AM
Dear

sorry if it is not clear....I'll try re-explain

On first step:
I want to match the criteria from Sheet: Condition and columns to Mark in every row and compair it to the creteria on sheet: Condition Check
when criteria is found, the cells on that specific row, should be filled with cell Address depending on the columns specified on sheet Condition and columns to Mark for that specific criterira.

In simple words:
If you found the criteria 1 AND 2, then see which coulmn on that criteria, and go to those columns and add the cell address to the current cell.

As part of the code is when criteria is match, the cells , with cell address, is copied to the sheet call: Sub Totals

all this step is needed to next step, which takes all those cells addresses and Sort them according to columns AND rows, and make calculation of the differeces.

I think the following picture will clear what I wrote:

17952

Then I simply want the ability to find (I use Filter in the file, but it could be with any other solution) the differences results according to what I want (as wrote >> "give me the Equal or Higher value I type on TextBox")

That's it !

as i'm dealing with huge amount of conditions and more then 600 columns , the code running very very long time, and the other problem I'm facing is that the rows on result sheet could not be enough, as it can be more then a million rows... (much more!)

any suggestions to overcome those problems ?

offthelip
01-06-2017, 08:33 AM
I am not sure what you code is doing but I looked at your ColorCellconditions code and I can why this is very slow to run.
When trying to speed up macros one of the most important things to realise is that every reference to the spreadsheet takea a finite amount of time. It is much quicker to load all of the data into an array and process it within the arrays and then output it. This is particularly important where you have a multiple variable loops. So the original code is:

For Z = 2 To LR
'Loop to define cell address when criteria is match
For i = 2 To CriteriaCheck.Cells(Rows.Count, "A").End(xlUp).row
If CriteriaCheck.Cells(i, "A").Value = Sheet2.Cells(Z, "A").Value And _
CriteriaCheck.Cells(i, "B").Value = Sheet2.Cells(Z, "B").Value Then

For j = 3 To Sheet2.Cells(Z, Columns.Count).End(xlToLeft).Column
x = Sheet2.Cells(Z, j).Value
CriteriaCheck.Range(x & i) = CriteriaCheck.Range(x & i).Address
LastRow = Sheets("Sub Totals").Cells(Rows.Count, "A").End(xlUp).row
If LastRow = 1048576 Then GoTo SheetEnd
ActiveSheet.Range("A" & LastRow + 1).Value = CriteriaCheck.Range(x & i).Address
ActiveSheet.Range("B" & LastRow + 1).Value = Range(x & i).Column
ActiveSheet.Range("C" & LastRow + 1).Value = CriteriaCheck.Range(x & i).row
Next j
End If
Next i

Next Z



Note every single line of this references the worksheet apart from the If LAST ROW line, some of this is in a triple loop, that will take a looooong time

I have done a very quick rewrite of this to show you how to speed it up by getting rid of the references to the worksheet within the loop:


LR = Sheet2.Cells(Rows.Count, "A").End(xlUp).row
cclastrow = CriteriaCheck.Cells(Rows.Count, "A").End(xlUp).row
Sheet2array = Worksheets("Conditions and columns to mark").Range(Cells(1, 1), Cells(cclastrow, 13))
CCarray = Worksheets("Condition Check").Range(Cells(1, 1), Cells(LR, 2))
Sheet2output = Worksheets("Conditions and columns to mark").Range(Cells(1, 3), Cells(cclastrow, 500))
LastRow = Sheets("Sub Totals").Cells(Rows.Count, "A").End(xlUp).row




For Z = 2 To LR
'Loop to define cell address when criteria is match
For i = 2 To cclastrow
If CCarray(i, 1).Value = Sheet2array(Z, 1).Value And _
CCarray(i, 2).Value = Sheet2array(Z, 2).Value Then

For j = 3 To 11

collet = Sheet2array(Z, j)
If collet = "" Then Exit For
' convert this letter to a number so that we can use it for addressing!! Any chance of specifying the columns by number?
Columnno = 0
For kk = 1 To Len(collet)
Columnno = Columnno * 26 + (Asc(UCase(Mid(collet, kk, 1))) - 64)
Next kk
If LastRow = 1048576 Then GoTo SheetEnd
LastRow = LastRow + 1
' CCarray(i, Columnno) = CriteriaCheck.Range(x & i).Address
Sheet2output(lastrow ,1) = CCarray(i, columno)
' ActiveSheet.Range("B" & LastRow + 1).Value = Range(x & i).Column
' ActiveSheet.Range("C" & LastRow + 1).Value = CriteriaCheck.Range(x & i).row
Next j
End If
Next iC

Next Z


Worksheets("Conditions and columns to mark").Range(Cells(1, 3), Cells(cclastrow, 500)) = Sheet2output




I am sure this will not work because I didn't really understand what you were trying to do but it should show you how to speed it up. I expect code written like this to be 50 to 100 times faster.

10shlomi10
01-06-2017, 10:19 AM
Dear offthelip

First of all >> Many many thanks for your time and efforts !!
I noticed that you made 3 Arrays. Very nice indeed !

I start working on your code and after firuring out what you done, I correct some minor mistakes (especially typing mistake, upper case etc'....)

I came to this line of code and got an Error:


If CCarray(i, 1).Value = Sheet2array(Z, 1).Value And _
CCarray(i, 2).Value = Sheet2array(Z, 2).Value Then

17956


I'm placing here the code after corrected the minor mistakes:


Sub ColorCellsConditions()

'change the cursor to hourglass
Application.Cursor = xlWait


'Use the Status Bar to inform user of the macro's progress
'Makes sure that the statusbar is visible
Application.DisplayStatusBar = True
'add your message to status bar
Application.StatusBar = "wait...under pross...."
'
Application.Calculation = xlCalculationManual
'
'Remember time when macro starts
StartTime = Timer

Application.ScreenUpdating = False

LR = Worksheets("Condition and columns to Mark").Cells(Rows.Count, "A").End(xlUp).row
cclastrow = Worksheets("Condition Check").Cells(Rows.Count, "A").End(xlUp).row


'Set an Array
Worksheets("Condition and columns to Mark").Activate
Sheet2array = Worksheets("Condition and columns to Mark").Range(Cells(1, 1), Cells(cclastrow, 13))


'Set an Array
Worksheets("Condition Check").Activate
CCarray = Worksheets("Condition Check").Range(Cells(1, 1), Cells(LR, 2))


'Set an Array
Worksheets("Condition and columns to Mark").Activate
Sheet2output = Worksheets("Condition and columns to Mark").Range(Cells(1, 3), Cells(cclastrow, 500))
LastRow = Sheets("Sub Totals").Cells(Rows.Count, "A").End(xlUp).row


Worksheets("Sub Totals").Activate

For Z = 2 To LR
'Loop to define cell address when criteria is match
For i = 2 To cclastrow
If CCarray(i, 1).Value = Sheet2array(Z, 1).Value And _
CCarray(i, 2).Value = Sheet2array(Z, 2).Value Then

For j = 3 To 11

collet = Sheet2array(Z, j)
If collet = "" Then Exit For
' convert this letter to a number so that we can use it for addressing!! Any chance of specifying the columns by number?
Columnno = 0
For kk = 1 To Len(collet)
Columnno = Columnno * 26 + (Asc(UCase(Mid(collet, kk, 1))) - 64)
Next kk
If LastRow = 1048576 Then GoTo sheetEnd
LastRow = LastRow + 1
' CCarray(i, Columnno) = CriteriaCheck.Range(x & i).Address
Sheet2output(LastRow, 1) = CCarray(i, columno)
Next j
End If
Next i

Next Z


Worksheets("Condition and columns to Mark").Range(Cells(1, 3), Cells(cclastrow, 500)) = Sheet2output

sheetEnd:
Call SortAnd


Application.ScreenUpdating = True




'Determine how many seconds code took to run
SecondsElapsed = Round(Timer - StartTime, 2)




'restore default cursor
Application.Cursor = xlDefault
' gives control of the statusbar back to the programme
Application.StatusBar = False


'Notify user in seconds
MsgBox "The code has finished and Run for: " & SecondsElapsed & " Seconds " _
& vbNewLine & vbOKOnly + vbInformation, "Mark conditions"




Application.Calculation = xlCalculationAutomatic



End Sub

offthelip
01-06-2017, 10:26 AM
When you define an array by setting it to a range of cells the array is of type Variant. Variants do not have an attribute ".value" just use the array as a value
ie change
CCarray(i,2).value to CCarray(i,2) all the way through, sorry my mistake I was just editting what was there.

10shlomi10
01-06-2017, 10:35 AM
Thanks for the explanation. and you shouldn't be sorry , you help me A LOT !

I remove the .Value as you suggested and now I get this error:

17958

is it because the arrays are not in the same dimentions ?
or it does not matter ?

offthelip
01-06-2017, 10:39 AM
Yes, to debug this you need to hover with your mouse over the line where it has stopped and move from variable to variable and you can then see which one is "out of range" . This is undoubtedly because the way I defined the arrays was just a guess because I didn't understand what you were doing.
On closer inspection I can see at least one error. CCARRAY is defined as size down to LR while the loop[ goes from 1 to lastrow
Check the indexing is corredct on the other arrays, they shojld be of the same size as the loop index.

10shlomi10
01-06-2017, 11:01 AM
corrected from this line:

LR = Worksheets("Condition and columns to Mark").Cells(Rows.Count, "A").End(xlUp).row


to this line:

LR = Worksheets("Condition Check").Cells(Rows.Count, "A").End(xlUp).row

it Runs ....till it stop here:
17959

offthelip
01-06-2017, 11:10 AM
ccarray is only 2 columns wide, columno is alweays going to be more than 2 columns because the first column is column C (3). note at this piont in your code I had no idea what to do , so this line is wrong, you need to work out what you are trying to write into the outputlines. I can't really help with that because I don't know what you are doing.

10shlomi10
01-06-2017, 03:15 PM
thank you VERY much !
really appreciated.

I'll try to work it out


all the best.

10shlomi10
01-07-2017, 04:08 AM
Dear offthelip

I understand the the last step before writing the data from Array into the sheet itself, is to define the TARGET range in which I want to enter the data into (sheet).

How can you overcome this in cases that you can't predict from beginning the TARGET range ?
as the code is running on the list (top to LastRow, on 1000 rows, in above case) and if he find a match, the code should write the row address+column address i specified on the criteria list (for that specific match).

meaning, I don't really knows how many matches I'll have, and therefore, I can not predict how many rows will be on the result sheet.

How can I return the array results to sheet, if i don't know the TARGET range I should give the array to write into it ?

offthelip
01-07-2017, 05:37 AM
What I do in these circumstance is define an output array of 1000 rows, then increment a rowcount each for line as you enter it, when you get to 1000 you write the array in to the worksheet using code such as:

worksheets("sheet2").range(Cells(1+blkcnt*1000,3),cells(cclastrow+blkcnt*1000,500)=sheet2outpu t
blkcnt=blkcnt+1
rowcount=1


As you can see you just keep writing a fixed size at successive locations down the sheet, while resetting the rowcount.
Obviously set blkcnt to zero to start

10shlomi10
01-07-2017, 09:57 AM
sorry to get back to this but.....

After many hours of trying I asking for HELP !

This line of code.....

sheet2output(LastRow, 1) = CCarray(i, Columnno)

is driven me crazy....:banghead:

all I want to do is placing a list of cells address after they being checked for criteria on two numbers (ONLY):

Hope this will explain the matter:

17961

snb
01-07-2017, 10:14 AM
The attachment is undownloadable.

offthelip
01-07-2017, 10:26 AM
I think if you want to write out the address that you have found all you need to do is:

Sheet2output(LastRow, 1) = collet & i

10shlomi10
01-07-2017, 12:38 PM
It works !

but when I try to transfer to result sheet as you suggested above with:

blkcnt = 0Worksheets("Sub Totals").Range(Cells(1 + blkcnt * 1000, 3), Cells(cclastrow + blkcnt * 1000, 500)) = sheet2output
blkcnt = blkcnt + 1
RowCount = 1


it gives me this: (500 columns X 8 rows)

Actually....
the Maximum rows on destination/result sheet , should be : LastRow number X Total of Column letters.
lets say, 8 rows on Criteria sheet multiply by the latters of each row.

7 | 7 | A,B,C
3 rows for that criteria.


17963

offthelip
01-07-2017, 03:54 PM
change CClastrow to Lastrow:

Worksheets("Sub Totals").Range(Cells(1 + blkcnt * 1000, 3), Cells(Lastrow + blkcnt * 1000, 500)) = sheet2output

10shlomi10
01-07-2017, 04:23 PM
o.k.
I'll try it.

Many Many Thanks !!!!:beerchug:

offthelip
01-07-2017, 05:06 PM
if you implement my comment 13, where you define a 1000 row array then there are some changes that you will need:
the definition of sheet2output becomes:


' this goes at the initialisation bit
Sheet2output = Worksheets("Condition and columns to Mark").Range(Cells(1, 3), Cells(1000, 500))
blkcnt=0
lastrow=1

'After the line
Lastrow=lastrow+ 1
'you need to add
if lastrow=1000 then
worksheets("sheet2").range(Cells(1+blkcnt*1000,3),cells(1000+blkcnt*1000,500)=sheet2output
blkcnt=blkcnt+1
lastrow=1
' reset output array to nulls
for kk= 1 to 1000
for jj = 1 to 497
sheet2output(kk,jj)=""
next jj
next kk


end if


' then the final output line is
worksheets("sheet2").range(Cells(1+blkcnt*1000,3),cells(1000+blkcnt*1000,500)=sheet2output

10shlomi10
01-10-2017, 11:08 AM
Thanks (and sorry for my late reply....)

I'm afraid I reached the Max limit of excel sheet rows on that Array.

Can array split into as many sheet she needs ?:dunno

Meaning....
Is this line of code :

' then the final output line is
[COLOR=#333333]worksheets("sheet2").range(Cells(1+blkcnt*1000,3),cells(1000+blkcnt*1000,500)=sheet2output

can be split into several sheets , let say, if the code see it reached to row 1048576 on destination range (the range of the Array to write to)?

offthelip
01-10-2017, 11:30 AM
Yes, very easily, use a sheet count as well, so you have a three level counter, "row count" , "Block Count" and the "Sheet Count"

Set sheet count to 2 at the start,
shtcnt=2


then every time you increment the block count check to see if block count is over 1040 , if is it reset block count to zero and increment the shet count

If blkcnt > 1040 Then
shtcnt = shtcnt + 1
blkcnt = 0
End If




These additional sheets will need to be created prior to running the macro since this doens't create the additional sheets, but that can be done automatically as well if necessary


, then reference the sheet name as follows:

worksheets("sheet" & shtcnt).range(Cells(1+blkcnt*1000,3),cells(1000+blkcnt*1000,500)=sheet2outp ut

offthelip
01-10-2017, 11:35 AM
As a matter of interest how long is your macro taking to run now?? Have you made significant improvments?

10shlomi10
01-10-2017, 12:21 PM
Yes !!!
It was a HUGE improvement.

for 50,000 the code run a little more than 1 second !
I was amazed !! :clap:

Two thing:
1. I realized that the Maximum Rows that can be are : cclastrow * LastCol (the Total Rows multiply by Total number of columns, this is the Maximum if , by chance ALL criteria are match) >> I'm attaching the file so far...

2. As I need to search some results, after above code finish running , and as I have the problem of sheet rows limit,
Can the array be save to CSV file (as I read that CSV file has no limit of rows) ?
and then I will load it into Power Query module, which can deals with millions of millions rows.
In this way I can avoid the splitting of the data into several sheets.

You'll notice on attached file that I use , for checking, only 1 criteria to check 7 & 7 on 50,000 rows.
result gave me, for some reason, 50,000 instead of 200,000 (i.e: 50,000 rows X 4 columns => 200,000)

offthelip
01-10-2017, 04:07 PM
If after generating all this huge amount data, you then need to process it, why don'y you process it within VBA and not write the data out to a spreadsheet at all?
This depends a bit on what processing you are doing, but if it is just a search then why not do the search directly on the huge array in VBA? This will overcome any problems of exceeding the number of rows on a spreadsheet. What processing are you planning to do on this long list?

10shlomi10
01-10-2017, 10:38 PM
you are absolutely right.
it will be the best thing to do. but i think it can not be done in this case and i'll explain why:

The outcome which i'm after is: Find the gap/steps between cells position, in every column.

this file is much efficient from the first one i did.
as in the beginning, i actually open every column, set the cells address every time criteria is match, in every column that is specified on the sheet that "tells" which column i want to mark/write into. and than the code fetch the the gaps between each cells, on every column.
the next step was to sort the outcome i got (per coulms and than row) and calculate the gaps.
and last step, which actually this is what i want to find , is to get the gaps that equal OR above the number I typed in text box

when I did it with a small amount of data is was great BUT
as data was grow and so as criteria , the code was running as mention for hours (3-4 hours)

I'm attaching a file which present how originally I thought to solve this.
you'll notice that on last row, on "Criteria to find" sheet, there are some cells mark in red. those cells are not mentioned on columns to mark. i did it as i need to find the gap of last row till the first mark cell above it. so I forced the code to write the cell address every time he find this Last row cell empty.
Can I find the GAPS

offthelip
01-11-2017, 03:25 AM
Your original idea of marking the cells in red which meet your requirements is only part of the processing, what are you going to do with the the cells marked in red? Or are you just using it for decoration?
An alternative to that, which is probably easier to handle when the the data set is very large is to output a list of all the cells that meet your criteria and ignore the ones that don't. This could be done entirely with VBA. You specify the "number" which you want to generate the list for at the start, then process as you are already doing, then sort the results within VBA. To do this there are a number of routines available on the web if you google "Sorting arrays in VBA". Then having sorted it you can calculate the gaps and check for which gaps are over the number first entered and only output those, then you get a nice small array to output, and never have to output your massive data array.
I am still completely in the dark as to what you are really trying to do, but at least I am pleased to have solved your orignal query as to how to speed up your routine
Best of luck.

10shlomi10
01-21-2017, 06:45 AM
Dear offthelip

sorry for my late reply.
Just want to say THANK YOU VERY MUCH ! for your time and efforts.

I'll consider your offer on #27 answer.
as it seems very logical to handle this problem with this kind of method.

ALL THE BEST !:love

Shlomi