PDA

View Full Version : Improve efficiency of VBA Code



Whisky
01-31-2018, 10:02 AM
Hi All

I have cobbled together some VBA to create new variables/data based on data in in column 3 and 4 of the attached spreadsheet. The code works but also I am sure there must be a a more efficient and quicker way to do as I repeating a few processes more than once and current solution may not be scalable to a bigger dataset

The code does the following:

1) Copies the original data in column 3 and 2 and places it in columns AB and AC. I want to ensure this part and the rest is dynamic so that if column 4 and 5 are also populated it does this for these columns as well.

2) Based on the data in columns AB and AC (and other potential columns) it creates new variables based on:
a) One set taking the difference between successive values for each column e.g. AD6 = ab6-ab5 for each new column
b) Second set lagging all of the variables from columns from AB to AE etc. by one period

Clearly there are a few efficiencies. Would it be quicker to read the original columns into memory (C to last column) and then do the calculation to compute all of the derived variables and output results into the nearest free column? Technically this is beyond my capability hence my botch job! Alternative approaches or tweaks to simplify appreciated.

Many Thanks

W
--------------
Code and steps here:


Sub ARDL_Create_Lags_TEST()
Application.ScreenUpdating = False


Dim SourceRange As Range
Dim endrow, endcol, lastx, lastx2, i, z, k, j As Integer

endrow = Range("B20000").End(xlUp).Row
lastx = Range("B5").End(xlToRight).Column '
totalvar = lastx - 2

'Step 1 copy raw range over
Range(Cells(1, 28), Cells(endrow, 27 + totalvar)).Value = Range(Cells(1, 3), Cells(endrow, lastx)).Value

'STEP2 - create differenced variables
'Create differences of y and x variables
lastx2 = 27 + totalvar
For z = 28 To lastx2 'for each col
For k = 5 To endrow 'for each row
j = z - 27 'set col reference to start from 1


Cells(1, lastx2 + j) = Header & "Dif_" & Cells(1, z)
Range(Cells(k, lastx2 + j), Cells(k, lastx2 + j)).Formula = Cells(k, z) - Cells(k - 1, z)
Next k
Next z

'3 Create lags of all variables
FOut = Range("ZB1").End(xlToLeft).Offset(0, 1).Column

Set SourceRange = Range(Cells(1, 28), Cells(endrow, FOut))
LagsReqd = 2
DestColm = FOut
'copies over each column
For Each Colm In SourceRange.Columns
Header = Colm.Cells(1).Value

For i = 1 To CLng(LagsReqd)
Cells(1, DestColm) = Header & "L" & i
Colm.Offset(1).Resize(Colm.Rows.Count - 1).Copy Cells(i + 2, DestColm)
DestColm = DestColm + 1
Next i
Next Colm


End Sub



Any improvements to speed up the code would be great. For example originally to create the differences variables I tried to insert formula into spreadsheet instead of looping through each cell (k) one at at time. Something like this but could not get it to work: Range(Cells(6, lastx2 + 1), Cells(endrow, lastx2 + 1)).Formula = Cells(6, 28) - Cells(5, 28)

offthelip
01-31-2018, 04:24 PM
Clearly there are a few efficiencies. Would it be quicker to read the original columns into memory (C to last column)
Yes it is and hopefully my code does exactly that.

One of the main reasons that Vba is slow is the time taken to access the worksheet from VBa is a relatively long time.
To speed up vba the easiest way is to minimise the number of accesses to the worksheet. What is interesting is that the time taken to access a single cell on the worksheet in vba is almost identical as the time taken to access a large range if it is done in one action.
So instead of writing a loop which loops down a range copying one row at a time which will take along time if you have got 50000 rows it is much quicker to load the 50000 lines into a variant array ( one worksheet access), then copy the lines to a variant array and then write the array back to the worksheet, ( one worksheet access for each search that you are doing),
I have a simple rule for fast VBA: NEVER ACCESS THE WORKSHEET IN A LOOP.
I have rewritten you code using variant arrays and minimising the access to the worksheet. I haven't tested it and I am not sure I fully understood what you were trying to do with the lags so there very likely to be some errors, but this should show you how to do it.


Sub ARDL_Create_Lags_Variant()
Application.ScreenUpdating = False
Dim inarr As Variant
Dim endrow, endcol, lastx, lastx2, i, z, k, j As Integer

endrow = Range("B20000").End(xlUp).Row
lastx = Range("B5").End(xlToRight).Column '
totalvar = lastx - 2

'Step 1 copy raw range over
inarr = Range(Cells(1, 1), Cells(endrow, lastx)).Value
Dim outarr As Variant
Dim lagarr As Variant

ReDim outarr(1 To endrow, 1 To lastx)
ReDim lagarr(1 To endrow, 1 To CLng(LagsReqd) * lastx)

'STEP2 - create differenced variable
'Create differences of y and x variables
For z = 3 To lastx 'for each col
For k = 2 To endrow 'for each row

outarr(1, z) = Header & "Dif_" & inarr(1, z)
outarr(k, z) = inarr(k, z) - inarr(k - 1, z)
Next k
Next z
' output differences
Range(Cells(1, 1 + lastx), Cells(endrow, 2 * lastx)) = outarr
' now do the lags on outarr


LagsReqd = 2
'copies over each column
For z = 3 To lastx
For i = 1 To CLng(LagsReqd)

lagarr(1, 2 * z + i - 5) = Header & "L" & i
For j = 1 To lastrow
lagarr(j, 2 * z + i - 5) = outarr(j - i, z)
Next j
Next i
Next z
Range(Cells(1, 1 + 2 * lastx), Cells(endrow, (CLng(LagsReqd) + 2) * lastx)) = lagarr


End Sub

Whisky
02-01-2018, 06:04 PM
Thanks very much for the code and the advise - appreciate it.

After a bit of tweaking I managed to get the first part of the output related to creating the differenced variables to work but despite a few attempts couldn't get the second part of creating the lagged variables to work. Found it a bit too complex to follow! Basically here I wanted to create lagged versions of both the original variables (columns C and D in the example) and of the new columns created from the first part of the output (columns G and H) based on the number of lags specified ("LagsReqd" parameter) . This is essentially a copy of all the data shifted 1 cell below. So in the case of where LagsReqd = 1 - then in the lagged version of the column C variable - c5 value appears in row 6, c6 in row 7 etc. This is in order to create a dataset that I can use in some modelling work. Anyway here is the amended code:


Sub ARDL_Create_Lags_Variant_verson2() Application.ScreenUpdating = False
Dim inarr As Variant
Dim Laginarr As Variant
Dim endrow, endcol, lastx, lastx2, i, z, k, j, LagsReqd As Integer

LagsReqd = 2
endrow = Range("B20000").End(xlUp).Row
lastx = Range("B5").End(xlToRight).Column '
totalvar = lastx - 2

'Step 1 copy raw range over
inarr = Range(Cells(1, 1), Cells(endrow, lastx)).Value
Dim outarr As Variant
Dim lagarr As Variant

ReDim outarr(1 To endrow, 1 To lastx)
ReDim lagarr(1 To endrow, 1 To lastx * LagsReqd + 2)

'STEP2 - create differenced variable
'Create differences of y and x variables
For z = 3 To lastx 'for each col
For k = 3 To endrow 'for each row

outarr(1, z) = Header & "diff_1 " & inarr(1, z)
outarr(k, z) = inarr(k, z) - inarr(k - 1, z)
Next k
Next z
' output differences
Range(Cells(1, 1 + lastx), Cells(endrow, 3 * lastx)) = outarr
' now do the lags on outarr

'=========================================

'copies over each column


'Header = Colm.Cells(1).Value
For z = 3 To lastx
For i = 1 To LagsReqd

lagarr(1, 2 * z + i - 5) = Header & "L" & i


For j = 1 To lastrow
lagarr(j, 2 * z + i - 5) = outarr(j - i, z)
Next j
Next i
Next z
Range(Cells(1, 1 + 2 * lastx), Cells(endrow, (LagsReqd + 2) * lastx)) = lagarr
End Sub

SamT
02-01-2018, 07:04 PM
I changed the thread Title. I just could not stand the idea that anyone would think that VBA creates data.

Sorry 'bout that.

offthelip
02-02-2018, 09:39 AM
I just tested the code I wrote and found a couple of bugs which I have fixed in the version.

For Z = 3 To lastx
For i = 1 To lagsreqd

lagarr(1, lagsreqd * Z + i - 5) = Header & "L" & i


For j = 1 + i To lastrow
lagarr(j, lagsreqd * Z + i - 5) = outarr(j - i, Z)
Next j
Next i
Next Z
Range(Cells(1, 1 + 2 * lastx), Cells(lastrow, (lagsreqd + 2) * lastx)) = lagarr



This code does do exactly what you asked for it takes the values from the column in the OUTARR aray and shifts them down a row as many times as there are lags. and then writes them out to the worksheet.
I am not sure where you wanted the results written and currently the code writes them in a different place depending on how many lags you select.
I have tested it with 2 and 3 lags.

Whisky
03-03-2018, 12:38 PM
Oh many thanks for the changes. I have not yet got around to looking the new code yet but this code will help a lot so brill! On a related note I took you're original code and changed it just read this first col into an array (ignoring headers and cells with no values) and perform the differencing and paste out. I tried a range of versions but I could only get this one to work without error: As you will see the the final array starts from 2 to the number of observations. But logically it should start from 1 to the number of observations - 1 but that variation doesn't work. Any ideas why?

Sub GetFirstDiffOneVarOnly() 'one col only, no header
Application.ScreenUpdating = False
Dim RawVar As Variant 'array to collect raw data
Dim endrow As Integer, i As Integer, z As Integer

endrow = Range("B20000").End(xlUp).Row

RawVar = Range(Cells(5, 3), Cells(endrow, 3)).Value 'Step 1 create raw data input series
obs = Application.Count(RawVar) 'No of Data points

Dim TransVar As Variant 'output array
ReDim TransVar(2 To obs, 1 To 1) 'Array for output, need to start from 2 because no diff for the first observation maybe?
'STEP2 - create difference variable

For i = 2 To obs 'for each row
' TransVar(1, 1) = ChrW(916) & "1 Header"
TransVar(i, 1) = RawVar(i, 1) - RawVar(i - 1, 1)
Next i
'obs2 = Application.Max(TransVar)
' output differences
Range(Cells(6, 8), Cells(endrow, 8)) = TransVar
End Sub

Also here one version that does NOT work but I have no idea why!

Sub test()
Application.ScreenUpdating = False
Dim RawVar As Variant 'array to collect raw data
Dim endrow, endcol, lastx, lastx2, i, z, k, j, LagsReqd As Integer

' LagsReqd = 3
endrow = Range("B20000").End(xlUp).Row
lastx = Range("B5").End(xlToRight).Column '
totalvar = lastx - 2

'Step 1 copy raw range over
RawVar = Range(Cells(5, 3), Cells(endrow, lastx)).Value
Dim TransVar As Variant
' Dim lagarr As Variant

ReDim TransVar(1 To endrow - 4, 1 To lastx)
'ReDim lagarr(1 To endrow, 1 To CLng(LagsReqd) * lastx)

'STEP2 - create differenced variable
'Create differences of y and x variables
z = 1 'for each col
For k = 2 To endrow 'for each row
TransVar(1, z) = ChrW(916) & "1 Header"

'If (kk) = "" Then
'TransVar(k, z) = "yut"
TransVar(k, z) = RawVar(k, z) - RawVar(k - 1, z)

Next k
' Next z
' output differences
Range(Cells(1, 1 + lastx), Cells(endrow, 2 * lastx)) = TransVar
' Range("F1:f32").Value = Application.Transpose(TransVar)
' Range("F1:f50").Value = TransVar
' now do the lags on TransVar

'=========================================

End Sub

Sorry please bear with me, only started out playing with these arrays and have become confused very quickly!


Thanks

W

offthelip
03-03-2018, 04:43 PM
ReDim TransVar(2 To obs, 1 To 1) 'Array for output, need to start from 2 because no diff for the first observation maybe?

This line defines an array with (obs-1) values which you have said you want index using values from 2 to obs. This explains why it only works when you index from 2 to obs, If you defined it as

Redim TransVar ( 1 to obs-1, 1 to 1)
then you would have to index from 1 to obs-1

VBa allows you to define index into an array from any number to any other number.
Try this little sub:

Sub tst()

Dim inarr() As Variant
ReDim inarr(-3 To 5, 1)
For x = LBound(inarr()) To UBound(inarr())
MsgBox x
Next x
End Sub

One other thing to notice is that I have used the two functions Lbound and Ubound to find the low and the high values on the indices. This is going to be much more reliable when you are looping through arrays that using the Application.count. because this relies on all cells being occupied
I had a look at your second "test" sub and the indexing has gone wrong. See my comments in the code: ( there may be other errors too)


Sub test()

Application.ScreenUpdating = False
Dim RawVar As Variant 'array to collect raw data
Dim endrow, endcol, lastx, lastx2, i, z, k, j, LagsReqd As Integer


' LagsReqd = 3
endrow = Range("B20000").End(xlUp).Row
lastx = Range("B5").End(xlToRight).Column '
totalvar = lastx - 2


'Step 1 copy raw range over
RawVar = Range(Cells(5, 3), Cells(endrow, lastx)).Value
Dim TransVar As Variant
' Dim lagarr As Variant


ReDim TransVar(1 To endrow - 4, 1 To lastx) ' **** this defines an aray with endrow-4 values
'ReDim lagarr(1 To endrow, 1 To CLng(LagsReqd) * lastx)


'STEP2 - create differenced variable
'Create differences of y and x variables
z = 1 'for each col
For k = 2 To endrow 'for each row '' ***** this loop goes from 2 to endrow
TransVar(1, z) = ChrW(916) & "1 Header"


'If (kk) = "" Then
'TransVar(k, z) = "yut"
TransVar(k, z) = RawVar(k, z) - RawVar(k - 1, z) ' ****this will cause an error when k gets to endrow -3 which is beyond the end of the array


Next k
' Next z
' output differences
Range(Cells(1, 1 + lastx), Cells(endrow, 2 * lastx)) = TransVar ' Transvar is only 1 to endrow - 4 so the range is the wrong size
' Range("F1:f32").Value = Application.Transpose(TransVar)
' Range("F1:f50").Value = TransVar
' now do the lags on TransVar


'=========================================


End Sub

I often define arrays starting from row, even if my data starts a few rows down. It makes indexing easier. and if the first few rows have only got text such as headers it does matter if you copy them to an array, clear the cells and then write them back. Another useful tip is to stop the macro by putting a break in it just after all the definitions and use the local window to look at all the arrays and see where the indices start and end

snb
03-04-2018, 02:55 AM
I'd use:


Sub M_snb()
sn = Cells(1).CurrentRegion

sn(1, 1) = ""
sn(2, 1) = ""
For j = 3 To UBound(sn)
sn(j, 1) = sn(j, 2) - sn(j - 1, 2)
Next

Cells(1).CurrentRegion.Offset(, 3).Resize(, 1) = sn
End Sub

NB. I removed the empty column A first.
You should always use cell A1

See: http://www.snb-vba.eu/VBA_Arrays_en.html

Whisky
03-05-2018, 04:23 AM
Hi I am trying to reply to this but am getting the message: "Post denied. New posts are limited by number of URLs it may contain and checked if it doesn't contain forbidden words." Any ideas how to overcome? I have not pasted any URLs, don' know about Forbidden words though!

werafa
03-06-2018, 01:32 AM
that post worked :]

georgiboy
03-06-2018, 11:07 AM
I am making this post on behalf of Whisky as he/she is not able to post it for some reason:

Thanks for the suggestions all. Some of the things are starting to make more sense now!

@offthelip. Orginally I did try running the code as you suggested (Redim TransVar ( 1 to obs-1, 1 to 1), and For i = 1 To obs) as this makes more sense but it generated the "subscript out of range" error (hence I was confused) on this line:


TransVar(i, 1) = RawVar(i, 1) - RawVar(i - 1, 1))

I just re-run the amended code below and checked the local windows - it correctly creates an array of 1 to 26 but all the values in the array are empty. Also getting same error if I use "For i = LBound(TransVar()) To UBound(TransVar())"


Another question. Instead of returning results to a sheet, if I want to pass on the output array (TransVar) as a data input array into another sub does it matter what type of output array solution I create? e.g Using UBound and LBound, or if my array starts from 1 instead of 2? Any thoughts?

Also I read somewhere on a tutorial that using UBound and LBound can slow down the running of the code. Could that be true?

Thanks,


W


Sub GetFirstDiffOneVarOnly() 'one col only, no header

Application.ScreenUpdating = False
Dim RawVar As Variant 'array to collect raw data
Dim endrow As Integer, i As Integer, z As Integer

endrow = Range("B20000").End(xlUp).Row

RawVar = Range(Cells(5, 3), Cells(endrow, 3)).Value 'Step 1 create raw data input series
obs = Application.Count(RawVar) 'No of Data points

Dim TransVar As Variant 'output array
ReDim TransVar(1 To obs - 1, 1 To 1)

'STEP2 - create difference variable
For i = 1 To obs 'for each row
TransVar(i, 1) = RawVar(i, 1) - RawVar(i - 1, 1)
Next i

Range(Cells(6, 8), Cells(endrow, 8)) = TransVar
End Sub

offthelip
03-06-2018, 11:46 AM
You will get Subscript out of range because your index goes 1 beyond the end of the range. if you look at my post #7 I clearly stated:

This line defines an array with (obs-1) values which you have said you want index using values from 2 to obs. This explains why it only works when you index from 2 to obs, If you defined it as

Redim TransVar ( 1 to obs-1, 1 to 1)

then you would have to index from 1 to obs-1



you state:

Originaly I did try running the code as you suggested (Redim TransVar ( 1 to obs-1, 1 to 1), and For i = 1 To obs) as this makes more sense but it generated the "subscript out of range" error (hence I was confused) on this line:

Which is clearly NOT what I suggested!!!

You also state:

but all the values in the array are empty.
I presume you are talking about the Transvar array. To debug this put a breakpoint at the start of the lop and step through it checking what you have got in each of the array variables as they are indexed.

Using Lbound ubound once or twice is not going to slow down you code significantly. One of the most effective ways of writing fast VBA code is to learn how to use variant arrays instead of Ranges, which is what I am trying to show you. Typically changing from using ranges in a loop to using variant arrays in a loop reduces the time taken by a factor between 100 and 1000. i.e. really significant


if I want to pass on the output array (TransVar) as a data input array into another sub does it matter what type of output array solution I create?
No provided your next sub deals with the array correctly

Whisky
05-10-2018, 08:41 AM
Hi

Sorry for the delay in following up with this post, I got sidelined by other stuff and now I have bit more time to focus on this so I'm back!

Thanks for comments and apologies if I misunderstood what you suggested which seems to be the case.

So for my own sanity I've created a much simpler example from scratch again (attached). The code accurately reads in the data in column c (PART1) and then creates a new array (Transvar) containing the delta and pastes the results into column E (part 2). This works fine and I understand how.



Sub test_Array() '
Dim endrow As Single, startRow As Single
Dim startCol As Integer, endcol As Integer
'Dim series

'PART 1
startRow = 5
endrow = Range("B5").End(xlDown).Row
startCol = 3
endcol = 3
raw = Range(Cells(startRow, 3), Cells(endrow, 3)).Value

Dim series As Variant
Dim z As Single
obs = Application.Count(raw) 'No of Data points

'PART2
'create new array with differences
Dim TransVar As Variant 'output array
ReDim TransVar(2 To obs, 1 To 1) 'Array for output, start from 2 because no diff for the first observation
For z = 2 To obs 'for each row
TransVar(z, 1) = raw(z, 1) - raw(z - 1, 1)
Next z

series = TransVar
Range("E6:E13").Value = series
End Sub

The new array (Transvar) starts from 2 to obs (2 to 9) which is 8 values. What I would like to do is understand how change the lower and upper bound of this array to be 1 and 8 , instead of 2 and 9 currently which is what I was attempting to do in my previous response/code but failed.

Is there an easy way to create the original Transvar array in partB of the code with upper bound (8) and lower bounds (1) or alternatively can create a new array (say Transvar2) in a new Part C re-basing the Transvar array from 2 to 9 to 1 to 8 instead?

Hope I'm making sense.

Thanks

Whisky

offthelip
05-10-2018, 09:11 AM
this should do it:


ReDim TransVar(1 To obs-1, 1 To 1) 'Array for output, start from 2 because no diff for the first observationFor z = 2 To obs 'for each row
TransVar(z-1, 1) = raw(z, 1) - raw(z - 1, 1)
Next z

Whisky
05-11-2018, 10:50 AM
Great. This is exactly what I needed and works fine.

I am now able to also pass on this data /series into another UDF. I did not work with the the original bounds that I created.

Thanks

W

Whisky
05-16-2018, 05:12 AM
Following up with this, I seem to run into a related issue.

Currently the TransVar above is defined as an array of data. I want to feed this data array into a UDF (written by someone else), but the UDF only accepts the input data as a range.

Is there a way to convert/transform a variant array into a range? Or would I have to paste the data back into a sheet and let the UDF read the data as a range from here?

Thanks

W








Thanks

offthelip
05-16-2018, 05:52 AM
The easiest way to convert a variant array to a range is to paste it back to the worksheet. However the way I would approach this is to look at rewriting the UDF to accept a variant array. There is a very good chance that the range input in the udf is only necessary so that you can use the UDF on the worksheet. Thus changing the type of the UDF so that you can use it with a variant array might be a trivial change. Obviously this variant of the udf you can only use with VBA.