PDA

View Full Version : Compare & Fill Data Help Needed !



mitko007
07-20-2013, 01:52 PM
Hi,
once again i really require your help guys. I am struggling with the following problem:

http://i459.photobucket.com/albums/qq314/mitko0007/macro-1.png

I have to compare the data in 2 columns - Index & Sec_Index. In case of a match it should check which Values is assigned to the Sec_Index and fill a "1" to the matching column corresponding to Index and "0" for all other Value columns (I hope the screenshot explains it better) I wrote a short macro which works good. However I have huge amounts of data - both Index columns contain at least 400000-500000 lines. This makes my code useless since it will take extreme long durations to execute.


Another option i tried was using the following excel function:

=COUNTIFS($H$2:$H$5,$B2,$I$2:$I$5,"A") - ' replace B & C

It also does the job for small data sets but is useless for large data.

Is there a way to make this work? I read about Variant arrays, but I'm not that familiar with them.

stanleydgrom
07-20-2013, 05:41 PM
mitko007,

What version of Excel are you using?

You are posting a picture. This means that if this was a problem where one needed to use your data, anyone trying to help you would have to enter the data manually. That makes no sense.

Are there only 3 values: A, B, and, C?

So that we can get it right the first time, please post a workbook with about 30 to 40 rows of your actual raw data (sensitive data changed).

To attach your workbook, scroll down and click on the Go Advanced button, then scroll down and click on the Manage Attachments button.

mitko007
07-21-2013, 12:59 AM
mitko007,

What version of Excel are you using?

You are posting a picture. This means that if this was a problem where one needed to use your data, anyone trying to help you would have to enter the data manually. That makes no sense.

Are there only 3 values: A, B, and, C?

So that we can get it right the first time, please post a workbook with about 30 to 40 rows of your actual raw data (sensitive data changed).

To attach your workbook, scroll down and click on the Go Advanced button, then scroll down and click on the Manage Attachments button.

Hi, thanks for your response. I am using Excel 2010. I'm attaching an example for you.

Basically i am comparing the Start_Splitted(UNIX) column on the "States" sheet (column L) with the Date_Time on Sheet1 (Column A). If they match than the columns on sheet1 (J:S) are filled either with 0 or 1 depending on the Index in "States" sheet (column K)

There is a macro that does the job, but it work only for small data sets. I have a huge data file, meaning my date entries are approx 400000-500000.
With the existing macro it will take extremely long to complete.

Even with the excel function mentioned above its way to long.

p45cal
07-21-2013, 02:27 AM
You could use a formula and then convert to values in situ. In J2 of the Sheet1:
=IF(INDEX(States!$K$2:$K$17,MATCH($A2,States!$L$2:$L$17,0))=J$1,1,0)
copied across and down.
Your first row of data has no match in States column L which yields an error in the cell which could also be converted to a zero by wrapping the same formula in an IFERROR:
=IFERROR(IF(INDEX(States!$K$2:$K$17,MATCH($A2,States!$L$2:$L$17,0))=J$1,1,0 ),0)

p45cal
07-21-2013, 03:25 AM
otherwise a tweak of your existing macro:
Sub FillStatesMatrix()
Dim StatesRange, DataRange, Rcount_states As Long, Rcount_data As Long, ResultsRange As Range, ResultsRangeData, ar2
Dim lngI As Long, lngJ As Long

Rcount_states = Worksheets("States").Range("L1048576").End(xlUp).Row
Rcount_data = Worksheets("Sheet1").Range("A1048576").End(xlUp).Row

StatesRange = Worksheets("States").Range("L2:L" & Rcount_states)
DataRange = Worksheets("Sheet1").Range("A2:A" & Rcount_data)
Set ResultsRange = Worksheets("Sheet1").Range("A2:A" & Rcount_data).Offset(, 9).Resize(, 10)
ResultsRange.Value = Empty '0 ' reinstate 0 to see zeroes.
ResultsRangeData = ResultsRange
ar2 = Application.Transpose(Application.Transpose(ResultsRange.Offset(-1).Resize(1)))
For lngI = LBound(StatesRange) To UBound(StatesRange)
For lngJ = LBound(DataRange) To UBound(DataRange)
If StatesRange(lngI, 1) = DataRange(lngJ, 1) Then
ResultsRangeData(lngJ, Application.Match(Worksheets("States").Cells(lngI + 1, 11).Value, ar2, 0)) = 1
exit for
End If
Next lngJ
Next lngI
ResultsRange.Value = ResultsRangeData
End Sub

p45cal
07-21-2013, 04:11 AM
You can use a macro to put the formulae (msg#4) in and convert to values:
Sub FillStatesMatrix2()
Dim ResultsRange As Range
With Worksheets("Sheet1")
Set ResultsRange = .Range("A2:A" & .Range("A1048576").End(xlUp).Row).Offset(, 9).Resize(, 10)
End With
ResultsRange.FormulaR1C1 = "=IFERROR(IF(INDEX(States!R2C11:R17C11,MATCH(RC1,States!R2C12:R17C12,0))=R1C ,1,0),0)"
ResultsRange.Value = ResultsRange.Value
End Sub
but it will probably be slower than the more complex macro in msg#5

mitko007
07-22-2013, 03:08 AM
Thanks guys for you efforts. I've tried your solutions but i break the calculations after some couple of hours of waiting time. I executed the last macro from p45cal 3 hours ago and its still running. Probably i'll need to think of something else when using such huge data sets :( Thanks anyway

p45cal
07-22-2013, 03:27 AM
We should be able to do this in a matter of minutes or seconds, not hours.
If you could make a version your file with the half-million or so length lists available on a file share site (eg. box.net amongst many) I will look at it.
If you don't want to put the file in the public domain, but you're happy for just me to see it, then Private Message me here art vbaexpress and I'll reply with my email address which you can use to send me the file directly.

mitko007
07-22-2013, 12:40 PM
the sample data file with the current macro could be found here http://www.4shared.com/file/ara6DaGp/example_large.html

p45cal
07-23-2013, 07:57 AM
I've been spending some time on this; the problem has been that just opening your largish file on my PC causes a big slow down, never mind trying to run anything! I've had numerous Excel crashes with it. So I don't know why my PC is slowed down so much, but I can't properly test a completely in-memory solution I had for you.
But it strikes me that you are looking for an awful lot of granularity (1 second intervals over several days) using not far short of a million rows on the sheet, certainly not something tha a human can look at and get useful information from quickly. This leads me to look at the problem from a bit further back:
Are the data in States!Columns L:M generated from the data in States!Columns A:B using such code as in your last thread (http://vbaexpress.com/forum/showthread.php?t=46895)?
If so, I feel it might be easier and quicker either to do this using the data in columns A:B (I guess you've deleted the States data associated with the data in Columns A:B?) or to generate your Sheet1 data at the same time as expanding the data in columns A:B.

What is it you're ultimately aiming to do? I'd be very surprised if there isn't a faster way, perhaps with fewer steps.

mitko007
07-23-2013, 09:06 AM
I've been spending some time on this; the problem has been that just opening your largish file on my PC causes a big slow down, never mind trying to run anything! I've had numerous Excel crashes with it. So I don't know why my PC is slowed down so much, but I can't properly test a completely in-memory solution I had for you.
But it strikes me that you are looking for an awful lot of granularity (1 second intervals over several days) using not far short of a million rows on the sheet, certainly not something tha a human can look at and get useful information from quickly. This leads me to look at the problem from a bit further back:
Are the data in States!Columns L:M generated from the data in States!Columns A:B using such code as in your last thread (http://vbaexpress.com/forum/showthread.php?t=46895)?
If so, I feel it might be easier and quicker either to do this using the data in columns A:B (I guess you've deleted the States data associated with the data in Columns A:B?) or to generate your Sheet1 data at the same time as expanding the data in columns A:B.

What is it you're ultimately aiming to do? I'd be very surpised if there isn't a faster way, perhaps with fewer steps. Hi, basically i am working on a fluid flow model. The granularity of 1 sec. should be kept in order to get accurate results. You are right, my two threads cover the same problem. The time ranges you helped me split in the first thread are used now in the second one. The data received from my sensors are in specific ranges depending when a sensor is triggered (example: start 00:01:00 - stop 00:10:00 getting a index A). In order to perform a numerical simulation in a 1 sec frequency i have to distribute these measurements into a 1 sec timelog covering the complete testing period (for example a week of testing) So to sum everything up - imagine i am performing a fluid flow simulation continuously for a week. Every now and then i get readings covering specific time intervals. After i collect the data i have to distribute these into my 1 week time log matrix in 1 sec freqeuncy. Thats all. I tried before with 10 sec. data and it worked for smaller data sets. With the 1 sec data its kind of tricky. And yes the data in L:M is created from columns A & B in the states Sheet.

p45cal
07-23-2013, 09:14 AM
After i collect the data i have to distribute these into my 1 week time log matrix in 1 sec freqeuncy. Thats all. I tried before with 10 sec. data and it worked for smaller data sets. With the 1 sec data its kind of tricky.OK. So what's going to use that million rows of data? Another machine? A human? Perhaps you want to chart the data?




And yes the data in L:M is created from columns A & B in the states Sheet.So could you provide a similar sheet with the States column not deleted, then I can try and work from that data.

mitko007
07-24-2013, 12:38 AM
Hi p45cal,
yes, once the index matrix is distributed with the values 0 or 1 i could use this to perform my analysis, plot data and so on...

I created a new clean example. The time stamps are in unix time format but i guess this makes it even easier in a way. I'll send you a PM with the link to the data file

p45cal
07-24-2013, 01:31 AM
I've tried to reply already but it's not showing.
I can't see Private Messages at the moment with the updated interface at vbaExpress, an error page shows. When I can see them I'll have a look at your new file, but re "use this to perform my analysis, plot data and so on..." I feel that a lot of analysis can be done without expanding the data to a million rows. Can you give more detail on the analyses you want to perform on the data?

mitko007
07-24-2013, 01:47 AM
I've tried to reply already but it's not showing.
I can't see Private Messages at the moment with the updtaed interface at vbaExpress, an error page shows. When I can see them I'll have a look at your new file, but re "use this to perform my analysis, plot data and so on..." I feel that a lot of analysis can be done without expanding the data to a million rows. Can you give more detail on the analyses you want to perform on the data?

It will be a long explanation if i go through the complete process, but in terms of flow dynamic modelling 1Hz data frequencies are typical and desirable to achieve accurate flow behavior model.

The file is uploaded here:

http://www.4shared.com/file/MBmyodI7/Example_11.html

p45cal
07-24-2013, 02:14 AM
I'll look at the file later today. In your (earlier) sheet you already have that 1Hz granularity, without the repeating rows. I realise at the moment you really want to expand the data. but we're experiencing problems with sheer size of file and it's stopping us in out tracks. Could you perhaps point me to a website which would give me a clue as to how one might/you want to analyse flow behaviour? For one, plotting the data from your unexpanded data should be perfectly feasible with an xy-scatter chart (and Excel would probably take its time to plot some 866000 points).

mitko007
07-24-2013, 02:26 AM
basically the problem is that i want to distribute my readings (Index) in to the timelog on Sheet1. I have other data in sheet1 (Columns named A, B....) which are already recorded in 1Hz. The only part i struggle with, is the one with the index problem since they aren't recorded in 1 Hz but in time ranges.

For rheological modelling there are is a lot to be found on the internet.
http://en.wikipedia.org/wiki/Power-law_fluid

As i said, i got it up to the point where i try to distribute ranged data in 1Hz Time sequences.

hiker95
07-24-2013, 09:18 AM
mitko007,

I have been having a problem with my old userid/handle stanleydgrom.

I have created a macro using arrays in memory, in response to your reply #3.

In reference to that workbook:

Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).




Option Explicit
Sub FillStatesMatrixV2()
' stanleydgrom, 07/21/2013 --> hiker95, 07/24/2013
' http://www.vbaexpress.com/forum/showthread.php?t=46902
Dim a As Variant, s As Variant, j As Variant, js As Variant
Dim lrs As Long, lra As Long, i As Long, c As Long, fr As Long, fc As Long
Application.ScreenUpdating = False
With Sheets("Sheet1")
lra = .Cells(Rows.Count, 1).End(xlUp).Row
.Range("J2:S" & lra).Value = 0
j = .Range("J1:S" & lra).Value
js = .Range("J1:S1").Value
a = .Range("A1:A" & lra)
.Range("A1:A" & lra).NumberFormat = "0"
End With
With Sheets("States")
lrs = .Cells(Rows.Count, "L").End(xlUp).Row
s = .Range("K1:L" & lrs).Value
End With
For i = 2 To UBound(s, 1)
fr = 0
On Error Resume Next
fr = Application.Match(s(i, 2), Sheets("Sheet1").Columns(1), 0)
On Error GoTo 0
If fr > 0 Then
fc = 0
For c = 1 To UBound(js, 2)
If s(i, 1) = js(1, c) Then
fc = c
Exit For
End If
Next c
If fc > 0 Then
j(fr, fc) = 1
End If
End If
Next i
With Sheets("Sheet1")
With .Range("A1:A" & lra)
.Value = a
.NumberFormat = "General"
End With
.Range("J1").Resize(UBound(j, 1), UBound(j, 2)) = j
.Activate
End With
Application.ScreenUpdating = True
End Sub




Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm

Then run the FillStatesMatrixV2 macro.



I am not able to download your latest workbook in reply #15.

Can you try using the following free site:

You can upload your workbook to Box Net (http://www.box.net),
mark the workbook for sharing
and provide us with a link to your workbook.

Have a great day,
hiker95

mitko007
07-24-2013, 10:09 AM
mitko007,

I have been having a problem with my old userid/handle stanleydgrom.

I have created a macro using arrays in memory, in response to your reply #3.

In reference to that workbook:

Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).




Option Explicit
Sub FillStatesMatrixV2()
' stanleydgrom, 07/21/2013 --> hiker95, 07/24/2013
' http://www.vbaexpress.com/forum/showthread.php?t=46902
Dim a As Variant, s As Variant, j As Variant, js As Variant
Dim lrs As Long, lra As Long, i As Long, c As Long, fr As Long, fc As Long
Application.ScreenUpdating = False
With Sheets("Sheet1")
lra = .Cells(Rows.Count, 1).End(xlUp).Row
.Range("J2:S" & lra).Value = 0
j = .Range("J1:S" & lra).Value
js = .Range("J1:S1").Value
a = .Range("A1:A" & lra)
.Range("A1:A" & lra).NumberFormat = "0"
End With
With Sheets("States")
lrs = .Cells(Rows.Count, "L").End(xlUp).Row
s = .Range("K1:L" & lrs).Value
End With
For i = 2 To UBound(s, 1)
fr = 0
On Error Resume Next
fr = Application.Match(s(i, 2), Sheets("Sheet1").Columns(1), 0)
On Error GoTo 0
If fr > 0 Then
fc = 0
For c = 1 To UBound(js, 2)
If s(i, 1) = js(1, c) Then
fc = c
Exit For
End If
Next c
If fc > 0 Then
j(fr, fc) = 1
End If
End If
Next i
With Sheets("Sheet1")
With .Range("A1:A" & lra)
.Value = a
.NumberFormat = "General"
End With
.Range("J1").Resize(UBound(j, 1), UBound(j, 2)) = j
.Activate
End With
Application.ScreenUpdating = True
End Sub




Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm

Then run the FillStatesMatrixV2 macro.



I am not able to download your latest workbook in reply #15.

Can you try using the following free site:

You can upload your workbook to Box Net (http://www.box.net),
mark the workbook for sharing
and provide us with a link to your workbook.

Have a great day,
hiker95

Hi hiker95,
thanks for your effort. I uploaded the latest file to the website you requested:

https://app.box.com/s/48jkr94o1eagtlwlvpl3

hiker95
07-24-2013, 10:50 AM
mitko007,

Please do not quote entire replies from your helper. When quoting follow these guidelines:
1. Quote ONLY if it is needed to add clarity or context for your reply. If so, then
2. Quote ONLY the specific part of the post that is relevant - - not the entire post.
This will keep thread clutter to a minimum and make the discussion easier to follow.

In reference to the workbook in your reply #3 vs the latest workbook:

Both worksheets, Sheet1:
we are working with column A, Date Time (Unix), and columns J thru S


Worksheet's States:
the columns are different

In the workbook reply #3:
column K, Index
and, column L, Start_Splitted (UNIX)


Are the next statements correct??
In your latest workbook, worksheet States, we are dealing with:
column A, StartTimestmp(Unix)
and column F, Index

mitko007
07-24-2013, 11:35 AM
Sorry i uploaded the wrong example. Here is the correct one.

https://app.box.com/s/9y69eowoxbghli3b0lpw

and sorry for the quoting...

hiker95
07-24-2013, 12:58 PM
mitko007,

Sorry, V2 and V3 of the macro, even running on my new Lenovo P500 Ideapad, can not handle 700,000 to over 800,000 rows of data in two worksheets.

Click on the Reply to Thread button, and just put the word BUMP in the post. Then, click on the Post Quick Reply button, and someone else will assist you.

mitko007
07-24-2013, 01:07 PM
BUMP

mitko007
07-24-2013, 01:09 PM
hiker,
thanks anyway...i will have to figure another solution to my problem. Probably try using other tools, or figure a completely new approach.

Best regards

hiker95
07-24-2013, 01:57 PM
mitko007,


i will have to figure another solution to my problem. Probably try using other tools

Good idea.

Sorry I could not solve the problem.

p45cal
07-24-2013, 02:17 PM
re: BUMP
reminder:
I'll look at the file later today. I may have an offering tomorrow.

p45cal
07-25-2013, 09:35 AM
Seems I'm not the only one with problems arising from the sheer size of the file!
There's a link to a file here:https://app.box.com/s/t88iihkzuikanv8hjmw6 which is a version of the file you linked to in msg#15 which contains the macro blah2.
At the moment, though this is probably tweakable with an adjustment to the 3rd argument of 2 of the Application.Match lines, the time for each start and end period should be exactly present in the list on Sheet1 and the dates in column A must be in order.
The solution is nearly a completely in-memory solution, but I was stymied by the limitation of Application.match to work only with an array less than 65536 elements, even in Excel 2010! Once I'd discovered this, doing the Match on a range on a sheet was not significantly slower.
This solutoin took 6 minutes on my creaky old desktop.
The macro for those inerested:
Sub blah2()
Dim ADataRng As Range, xxx As Range, ADataRngValues, xxxValues, i, j, TheStatesValues
Dim StartBlock As Variant, EndBlock As Variant, ar1, ResultsRange As Range, ResultsRangeValues, StateOffset

Application.ScreenUpdating = False
With Sheets("Sheet1")
ar1 = Application.Transpose(Application.Transpose(.Range("J1:S1").Value))
Set ADataRng = .Range(.Cells(2, 1), .Cells(.Rows.Count, 1).End(xlUp))
ADataRngValues = Empty
Set ResultsRange = ADataRng.Offset(, 9).Resize(, 10)
ResultsRange.Value = Empty '0 'change Empty to 0 to see zeroes among the results instead of blank cells.
ResultsRangeValues = ResultsRange.Value
End With
With Sheets("States")
Set xxx = .Range(.Cells(2, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(, 2)
xxxValues = xxx.Value
TheStatesValues = xxx.Offset(, 5).Resize(, 1)
For i = 1 To UBound(xxxValues)
StartBlock = Application.Match(xxxValues(i, 1), ADataRng, 0)
If Not IsError(StartBlock) Then
EndBlock = WorksheetFunction.Match(xxxValues(i, 2), ADataRng, 0)
If Not IsError(EndBlock) Then
StateOffset = Application.Match(TheStatesValues(i, 1), ar1, 0)
For j = StartBlock To EndBlock
ResultsRangeValues(j, StateOffset) = 1
Next j
End If
End If
Next i
End With
ResultsRange.Value = ResultsRangeValues
Application.ScreenUpdating = True
End Sub
Note also the comment in the code regarding seeing the 1s in the results range amongst blank cells or amongst cells with zero in them.

mitko007
07-25-2013, 11:25 PM
Hi p45cal,
I don't know how to thank you man. I really appreciate it. You saved me a lot of work (really a lot). I'll run some test during the weekend and i'll let you know what came out but as you already tested it, i'm sure everything will be fine.

Meanwhile i got also another solution from a friend. It could be also of interest for you:



Option Explicit

Sub ertert()
Dim tm!: tm = Timer
Dim x, y, i&, j&, k&, bu As Boolean, r, ubx&, lbx&
With Application
.ScreenUpdating = False: .EnableEvents = False: .Calculation = xlCalculationManual
End With
With Sheets("States")
x = .Range("K2:L" & .Cells(Rows.Count, 11).End(xlUp).Row).Value 'где ищем
End With
'MsgBox Timer - tm: tm = Timer '0.4 sec
With Sheets("Sheet1")
y = .Range("A1", .Cells(Rows.Count, 1).End(xlUp)).Value 'что ищем
With .Range("J1:S1").Resize(UBound(y))
.Offset(1).ClearContents: r = .Value
End With
End With
'MsgBox Timer - tm: tm = Timer '1.27 sec
For i = 2 To UBound(r, 1)
For j = 1 To UBound(r, 2)
r(i, j) = 0
Next j
Next i
'MsgBox Timer - tm: tm = Timer '0.67 sec
With CreateObject("Scripting.Dictionary")
.CompareMode = 1
For j = 1 To UBound(r, 2)
.Item(r(1, j)) = j
Next j
For i = 2 To UBound(y)
lbx = 1: ubx = UBound(x)
Do
bu = False: j = (lbx + ubx) \ 2
Select Case y(i, 1)
Case Is > x(j, 2): lbx = j + 1
Case Is < x(j, 2): ubx = j - 1
Case Is = x(j, 2): bu = True: Exit Do
End Select
Loop Until ubx < lbx
If bu Then If .exists(x(j, 1)) Then r(i, .Item(x(j, 1))) = 1
Next i
End With
'MsgBox Timer - tm: tm = Timer '6.20 sec
Sheets("Sheet1").Range("J1:S1").Resize(i - 1).Value = r
With Application
.ScreenUpdating = True: .EnableEvents = True: .Calculation = xlCalculationAutomatic
End With
MsgBox Timer - tm '33.49 sec; total ~43 sec
End Sub


Thanks once again