PDA

View Full Version : Multiple Linear Progression, logging and VBA

CPerry
03-05-2018, 04:13 PM
Multiple Linear Regression, logging and VBA

My problem involves using close to 200 independent formulas to predict movements of prices for 6 separate but connecting brands (Greyhounds in a single betting market prerace).
I have data coming into my spreadsheet from a programme automatically. From there I have formulas in Cells C40:H71 (6 x 32 = 192 cells), underneath I’d like to show which of these formulas are closest in predicting the actual moving value for each of 6 brands.

I know that my best bet will be to use multiple linear progression however for computing power purposes, is it best to firstly shorten my sample by finding each formulas Euclidean Distances? The data can be shooting into my spreadsheet at up to 20m/s so as time passes I’d like my system to ‘remember’ the previous 30 seconds worth of data (and have this figure easily able to adjust). Cell F5 (on the same tab named 'Bet Angel') has a one second interval timer to keep track of this.

The outputted data (which I will use Cells C73:H104) will then be ranked and (please correct me if my logic is off here and provide alternative solutions) with the – for example 10 – formulas that have the highest R(sq) value I will put these figures into another linear regression matrix to display a column of figures that I can use the FORECAST function with to predict the next 30 seconds worth of movement (by taking the angle of the line created in a graph to judge momentum of each brand)

Alternatively to using cells C73:H104, I have a new tab called ‘Data’ which could do this part of the system. Starting from Cell C5 and going along the 5th row (then go 30 rows down – 1 per second) and take an average back on the original sheet (Named Bet Angel). The whole of this system I would like to eventually run solely with VBA but for now part code and part formula will suffice.

If anybody has any ideas of how this can work and can provide formulas/vba code, please let me know! Can’t get my head round how this vital part will tick.

Cheers,

CPerry :)

werafa
03-06-2018, 01:55 AM
Hi,

I cant answer for some of the stats problems, but i have a suggestion to manage the data

Try putting it into a Dynamic Named Range. you should be able to set this to manage your 30 sec data grab (it DNRs make use of the offset function).

honestly, I'd chart the data and use the mk1 eyeball as my first port of call for analysis
I'd also generate a chart of variance and look for models that show results in the predicted error range to identify valid models, and do something to mark outliers or model failures. Do this by running charts off the Dynamic Named Range, or by using intermediate tables/pivot tables (you will need to use events to manage auto-refresh of the data caches)

I have a strong suspicion that this will give you both instantaneous and overall best models more easily than single stats will, and won't need much vba hacking

CPerry
03-06-2018, 04:09 AM
21751

The problem with putting the information into a dynamic named range is that there is so much of it, if I were to put each cell longways I'd run out of columns available to me so I'm unsure how this would work? :/

Say I used variance charts to find my most accurate indicators how would I then have the worksheet know how to use these specific formulas? I imagine it would be some sort of MATCH/INDEX formulas to feed the most useful data into a multiple linear regression? Then this itself would need to be logged over the past (however many) seconds in order to have enough information to generate a FORECAST. You know what I mean? I can't get my head around the order things will need to happen at this stage.

Attached is a blueprint of a spreadsheet I will be working with. Rows 1 - 20 are where data flies in from an external programme (updated at 20m/s). The yellow section underneath has a detailed explanation of it's purpose and the area underneath is where I'm hoping the magic should happen. Again, if anybody could be so kind as to type 'this cell should do this formula to find this', this cell will then do this formula etc so I can go away and do reading that would help massively! I'm looking to learn, I just don't know where to start.

CPerry.

werafa
03-07-2018, 12:12 AM
A dynamic named range can refer to a two dimensional table of any size - and its power is that it automatically resizes when new data is added.
you can then refer to rows or columns in the range. I believe you can set one to start at the last added data, and use a negative row offset to grab the most recent chunk of data. You would then chart whole rows or columns from the DNR.

not sure what you mean by running out of room. if you break excel, your are either doing it wrong, or truely need a database

I'd forget trying to make the spreadsheet too smart, and I think you are trying to do two jobs here - 1 is pick the most accurate suite of models, and 2is pick a model in real time. test your models separately, and only use tested and reliable models on live data.

also, if you are using datasets that break excel, you might have value looking at Microsoft BI

SamT
03-07-2018, 11:05 AM
(6 x 32) at 50/sec for 30 seconds = 288000 Data Points or Cells

Each Market requires 32 columns and ~1500 Rows

OTOH, 6*40 columns will fit on one sheet. I suggest that you simply use some of my previous work for you to combine the 6 Rows of incoming data into one Array of 6 * 32+8 (240) and follow Werafa's idea of recording much data, then analyze it "offline" to determine which Formulas are best.

Only after you know which formulas you are going to use can you design the Project effeciently.

CPerry
03-08-2018, 02:02 PM
Hey SamT, long time no speak!

Yeah I've had a complete redraft of ideas, basically instead of chasing the prices and just having a genius robot notice things live like I was trying to do, I am instead looking at identifying and defining the market's many variables then having formulas judge the conditions of each. I've taken both of your recommendations to find a workable system first and then just have it run instead of trying to be too clever and spent most of the day pretty much just typing up the near 100 lonnnnnnnnnng formulas ready to crunch.

Now my immediate problem is trying to put it into a multiple linear regression, I have decided to go across the 'Data' tab to record around 100 formula cells each second and then I was hoping to use this moving data to watch how the multiple linear regression's information moves as these short markets run. Problem is if I use the regression option on the data analysis toolkit I get a single 'load' of this data for one second whereas I wanted to explore what happens each second.

Therefore my question is: is there a way around this using the data analysis toolkit? OR could somebody point me in the right direction to typing out the codes myself? From what I can gather I will only need: R(sq), Adjusted R, Significance F, Coefficients (inc. intercept) and P-values.

I have been doing plenty of reading SamT and can do basic VBA chores myself now which I never thought I'd be able to do (realllllllly simple things), occasionally look over the stuff you helped create and read back through the old thread to continue to learn. I'm enjoying these new challenges I keep giving myself. How have you been?

CPerry :)

SamT
03-08-2018, 02:38 PM
data analysis toolkit
Sorry, guy. I've never looked at it and am clueless about it.

I am drifting away from programming and only come here to kill time. Sometimes I see something really interesting or really simple and kill more time, but that doesn't happen all that often.

offthelip
03-08-2018, 04:56 PM
hi Cperry again!!
did you know that you can use the analysis package in VBa ( provides you load the add in in Excel options and tick the box in tool references) this will allow you to calculate the linear regression when ever you want to in your macro. This has the advantage is that you can control what and when it gets calculated rather than relying on the excel recalculation. Here is a subroutine that uses the linest routine. You can choose wheihc parameters you want to display by indexing into the "V" array"

Sub calcslope()
Dim yarr(1 To 20) As Double

Worksheets("Sheet1").Select

inarr = Range(Cells(1, 5), Cells(274, 5))
Range(Cells(15, 29), Cells(274, 30)) = ""
outarr = Range(Cells(1, 29), Cells(274, 30))
For i = 34 To 274
For y = 1 To 20
yarr(y) = inarr(i - 20 + y, 1)
Next y
V = Application.WorksheetFunction.LinEst(yarr, , , True)

outarr(i, 1) = V(1, 1)
outarr(i, 2) = V(3, 1)
Next i

Range(Cells(1, 29), Cells(274, 30)) = outarr

End Sub

OTL

CPerry
03-11-2018, 04:53 PM
Hi there OTL!!! How are you? What’s new?

I vaguely remember you saying excel had a bunch of formulas in-built to VBA a while ago but no idea linear regression was, that will be so handy!

I tried out the code from the previous message but I get:
Run-time error ‘13’: Type mismatch.
When I debug it:
I get: yarr(Y) = inarr(I – 20 + Y, 1) highlighted yellow.
Hovering over the yellow highlighted area I get: yarr(Y) = 0. Is that simply stating that the y line is 0? Because this is when some formulas are 0 because money hasn’t filled the entire market yet so for a portion of time some cells may indeed be 0.

I tried calling for Sub calcslope () in the middle of the the ‘record_data’ sub which seemed to work fine up until the point that the type mismatch error popped up again. I already know the problem is going to be having too many formulas for the horizontal area (255 cells) to handle with the way I’m doing it now.

Is there a way of calculating the average of the formulas that are appearing as the contents of Cells C40:H79 live as they are appearing in those cells? I will then need to calculate 6 multiple linear regression calculations (a lot more when I start on the final version)! At the moment column C will be the X variables for Brand 1, Column D for the second brand etc.

(This is because around 40 formulas will be examined for each brand independently of one another). Same jist as last time, up to 20m/s refresh rate operating as quick as possible to execute commands quicker than a person reacting to the market but spending a lot of time just watching the markets tick slowly for periods of time.

Any ideas at this stage just throw them at me.

Cheers,
CPerry.

offthelip
03-12-2018, 10:48 AM
Hi Cperry,
I haven't worked through your complicated workbook to get any idea about where you are thinking of applying the liniear regression. The sub I posted does work Ok, you just need to set the input and output arrays correctly. There is no error checking because mainly because I wrote for myself and if it goes wrong I just correct the error. I have attached a file that show you how to use it using a demonstration data file.
Reading about what you are trying to do. I get the impression you are trying to develop a system that will optimise itself in real time, which might seem a good idea in theory but I feel you are going to very quickly run into problems with processing power, secondly repeatabilty of results and thirdly it is not the quick way to develop a system.
For example you might find that on one day there are two of your equations that come up with fantastic results. How are you going to tell whether this is due to chance or whether those two equations together really do have an edge?
I personally feel that development of a system should be done in stages:
1: Capture a lot of realtime data to form a data base to test your systems against
2: Split the data base in two sets , firstly the development set, secondly the verification set.
3: Try each of your equations against the development set and check if any have any advantage. While doing this record the values of the other equations and check in the results whether there is any correlation between winners and the other equations. If there is this can be used to try and develop some combinations of the equations to improve the results
4: Iterate around this trying to find a combination of the equations that seems to give good results and lots of results ( 100 to 1000)
5: Finally when you have some combination that seems to work try it on the verificatiosn set and see if the results are any good at all. Usually the system will fail when you try it on the verifiaction results. This is because you have "tailored" your system to the development data set.
6: Occasionally if your ideas are good you will come up with a system the works with both sets. Only then is it worth trying to develop this into a real time system

One of the real problem of testing on a real time system is it is too slow!!! You can't speed up time. So each test of a 15 minute race will take 15 minutes. It is much faster to capture the data and run each test off line, you can run hundreds of tests in 15 minutes off line!

CPerry
03-14-2018, 03:15 PM
21825

Hi OTL,

Thanks for the example document! I had a little play with it and now see how I can make it work for my setup! I think I agree with you with regard to processing power (I’m planning on finding my edge and then checking out ways of implementing it more efficiently later – I’ve heard the programme R is meant to be great).

With regard to repeatability and this not being a quick way of developing a system, I think I have already come across a problem. At this stage I am simply looking to create a ‘momentum’ checker bot I suppose you could call it and then test, test and test again like you suggested. Throwing in new equations and out old ones as I power through logged information. To do that though I wanted to create all of the formulas so they can be scrutinised.

I thought that these formulas combined into this simple multivariate linear regression would give me a good sense of live momentum within the market based on the most recent ‘state’ of each market it is analysing. I was then going to TREND or FORECAST to extend the line and then take the angle of this line and use it as a momentum reading (even turn it into a % so you create a more traditional momentum indicator like a RSI or something.

This being said, I have a problem. I have attached a single sheet that contains the 40 formulas (X variables) and the ‘Last Traded Price’ (Y variable). Underneath you can see that I have used the LINEST function of this 30 second test (hoping to explore the movement of strengths of each individual coefficient) but I get ERRORS. I assume this is because of the slow moving market (In fact the last traded price stays stationary for this very short period)

So I ask, do you think a correlation or regression is the next path to explore or something else? I am fascinated by this and love exploring these complex numbers, I just keep running into dead ends and am having to recalibrate myself often.

Cheers,
CPerry.

CPerry
03-14-2018, 03:45 PM

I suppose what I'm looking to do is summarised in this video. I'll be looking to swing trade these short markets (10/15 minutes for the horses remember), though with football it may be the majority of a whole day. I don't believe I'm giving too much away here when I say my formulas involve analysing the characteristics of other brands against Brand 1 to get the first linear regression calculation, the characteristics against brand 2 to get the second linear regression and so forth.

I think that is the avenue I am looking to find success in, swing trading these short-term markets with advanced, customised momentum indicators. Sometime it's good to reflect and confirm what it it you think you're doing I believe, otherwise I'll just forever walk around lost :banghead::bug::rofl:

offthelip
03-14-2018, 05:19 PM
I had a look at your spreadsheet and I think your are interpreting your results incorrectly. You get #N/A in cells C35 to AL37, this is correct and exactly what the linest function produces.
If you look at the diagram in this link you can see that the linest function only writes 5 rows for the first two columns after that it only two rows. This explains all the #N/A
You also get #Num A36, which is the F statistic I suspect that this is because your data is not very good data for this function : I notice that all the Y values are the same value, I think this means that instead of a line you have got a dot!!
personally I never use linest in this format I always use it from VBA, this is so that I can control when it gets recalculated. This is purely to keep my spreadsheets running fast. Array formulae on the spreadsheet can really slow your spreadsheet down, and the problem is it recalculates every time you enter anything on any of the workbooks. ( Ok I know excel is fairly smart at only recalculating only what it needs but I still help it by designing it to minimise the time.)
I have done a quick analysis of your data by working out the percentage change in each value from the starting point ( see rows 39 onwards). The variation between the columns is HUGE Column B varies about 1% while column AA varies by such a large %percentage I don't even know if it is one billion or 1000 billion!!. I am not surprised that Linest couldn't work out the F statistic. I can't work out what the slope and standard error of these would mean let alone how to use them. So I have no idea what you are trying to do so I can't advise on yoiur next step.

offthelip
03-14-2018, 05:37 PM
I have had further thoughts about a way to analyse the horse races.
if you have two horses in a race you have a simple linear problem rather like a childs see saw. it must balance in the middle at the end
With a three horse race you have got something like a three sided see saw , where once again it must balance in the middle but the lengths of each arm varies as the price varies.
In a 12 horse race you have got a dodecahedron. So I am thinking about using simple geometry to work out the positon of the centroid of this docdecahedron and track which way the centroid is moving!!
Once I have done that I would look at whether this gives any indication of which horse is going up or going down.
Just an idea!!

CPerry
03-15-2018, 07:20 AM
21832

Yeah I understand. In a 6 horse race I imagine a hexagon balancing on a centre point. Each point having some force underneath trying to push up (so the price for that horse rises) and another force pushing down (so the price falls). I was using formulas for brand 1 such as: (Brand 1 BACK + 1/2Brand 2 BACK + 1/2Brand 3 BACK...1/6Brand 6 BACK) to give me one variable then the same for the LAY money on each brand and trying to compare that with the linear regression. Obviously that has turned out to be a dead end and so is looking for a correlation from my reading yesterday. Because of that I am extremely glad you replied so swiftly otherwise I'd be spending most of today thinking about what to try next.

So, thinking out loud, in a hexagonal see saw, the force on each of it's six points will give me an angle (seen from the centroid) in a 3D space. From there I could work out the angle from each point to the 'ground' to show second by second how much momentum is affecting that specific brand individually.

If you take a look at the image I have created (it was the only 6-sided see saw I could find lol - imagine it can move from the centroid in any direction now just around the 180 degree plain). Brand 1 would be affected by the other 5 brands upforce (money at the their closest BACK price) and their downforce (money at their closest LAY price). From there it would give me a reading that I can infer from using trigonometry to show me the momentum as a single % which I could use as a main indicator.

So now... erm... I need to find a way of using geometry within excel to give me the foundations for the see saw and then I can worry about the individual up/downforce formulas and then trigonometry to give me a momentum %. I shall do some reading while I await your reply but any ideas where to start when exploring this sort of geometry based app in excel?

CPerry :)

SamT
03-15-2018, 09:35 AM
How well does the set of all Bets correlate to Wins?

IOW, how good are the crowd's predictions?

As I see it, you only have two strategies, 1) guess the winners, and 2) determine the best investments to maximise profits over costs.

In the Stock Markets, prices reflect actual values. In sports, bets reflect the Crowd's hopes. (Unless the Crowd is psychic. (Which it may be. :dunno: )

offthelip
03-15-2018, 09:41 AM
I am not familiar with the terminology in horse racing and I don't have any sort of "feel" for it. However I do realise that the sum of all the odds in race must add up to less that 100%, which is why one horse price effects another horses price. This was why I was thinking about the polygon and hopefully using the summation of prices. My idea was just to use it as a 2 dimensional device rather than a 3 dimension device. I think your idea about using it as a 3 dimensional calculation might well be the best idea in the end. However the calculations are a bit more complicated, (but exactly the sort of calculations I have done in my career). Do you know anything about vector maths?? These calculations are quite easy to do using dot and cross products of vectors.
To use the 2 dimension device I had thought of an irregular polygon, with the vertices evenly distributed around the centre but at different distances from the centre. The distance from the centre is proportional to the price. It is then fairly easy to calculate the centre of gravity of the polygon and find the distance and angle between the centre and the centroid. I think this could be a useful indicator of what is happening to the prices in the whole market.
This calculation doesn't need vector maths, just pythagorus.
Geometry in Excel is very easy because it has all the sin,cos,tan functions there already.

CPerry
03-15-2018, 01:54 PM
@ SamT: It does not matter on the outcome of the event, it is the backing and selling of the movements prior that matters so I'm just exploring ways of identifying momentum. Yes the market is led by the 'wisdom of the crowd' as they say but that difference in opinion is what causes fluctuations and opportunities, it's all about finding those moments as far as I'm aware.

@OTL: Yes, the sum of all odds will add up to <=100%. I know the basics of vector Maths from when I studied A level Maths recently. I was quite good at it. I have brain stormed the different variables present and imagined them within this 'see saw' idea. I'm just spending tonight refreshing myself on this area and coming up with some equations. I'll get back to you tomorrow with what I have and see if I can think of the logical next move. I have also messaged a few equally sad losers, such as myself, that find Maths fun (I need to get out more, I know) to help me account for all the variables and explain them well...

The only thing that I am unsure of at the moment is the 'angle each brand is seperated by'... like I said, I shall have a think and get back to you tomorrow :)

Cheers,
CPerry.

SamT
03-16-2018, 07:15 AM
@ SamT: It does not matter on the outcome of the event, it is the backing and selling of the movements prior that matters so I'm just exploring ways of identifying momentum. Yes the market is led by the 'wisdom of the crowd' as they say but that difference in opinion is what causes fluctuations and opportunities, it's all about finding those moments as far as I'm aware.

That sounds like you are trying to make the best series of bets that will maximize your profit over loss ratio regardless of which dog/horse/team Wins/Places/Shows at the end of the contest.

My advice would be "To keep in the forefront of your mind that the Crowd only controls the movement of the Odds, but cannot predict the outcome of the contest.

That Sporting Contest's outcomes are not like a Stock Market Day's End in any way, regardless of the fact that Bet Angel can be used on a Stock Market."

To avoid any mental confusion, I would stop using any "Market" terms in the mental and code design process, and would restrict myself to only using "Betting" or "Contest" terminology. OTOH, If using Bet Angel on a Stock Market, I would reverse the language requirements.

Since Bet Angel uses Excel Spreadsheet Ranges, Not Labels and Headers, I would go as far as changing them on the spreadsheet to using only "Contest" terms. Reinforcing the desired mental paradigm as much as possible.

CPerry
03-16-2018, 02:15 PM
That sounds like you are trying to make the best series of bets that will maximize your profit over loss ratio regardless of which dog/horse/team Wins/Places/Shows at the end of the contest. CORRECT!

Yes I will remember the 'wisdom of the crowd' and only go by trends I see in front of me, it is purely about the odd movement not the final destination. Yeah these markets, as they are properly labelled, do not have any 'Close' points with after hour trading or anything, they are purely constant, short-term markets; for all of their similarities, this is probably the biggest difference that affects all calculations.

I am still looking over my equations using this See Saw imagery, it has proven more difficult that initially expected but a fun little puzzle for the evening so I'm not complaining. I'll ponder over it more this weekend (maybe even some volume/volatility secondary formulas too as I imagine this SeeSaw will account for both trend and momentum simultaneously). I will have a think, log some events and perfect my equations, just didn't want to go tonight without replying in here. Enjoy the weekend gents. Speak soon!

CPerry :)

CPerry
03-20-2018, 05:15 AM
Ok I've had a good think, spent time learning and have come up with the idea that the hexagon on a centroid won't work because the nearest points will make the angle of the line of the Brand I am trying to find skew across the perpendicular dimension amongst other reasons. However, the same principle should work in theory with a simpler See Saw design.

5 Brands are on one side of the See Saw, each emitting an up and a downforce along various lengths of the left side. The Brand in question will exert an up and a down pressure on the right side and it's total elevation will be the strength of it's momentum. I have spent the last few days calculating this equation and I think I am near. I also have past logs that I can use with this equation to generate some tests which I will post on here once complete.

I would appreciate if you could look at this and see if one of my several equation theories seems to work. Just going out for a few hour snow but I'll get back on it tonight and get it done ASAP!

Just keeping you updated and updating this 'diary' of a thread (not sure if this is the correct use of the forum but I assume that people with similar ideas can see where I have failed and take a short cut to learn)

As always, Cheers!
CPerry :)

offthelip
03-20-2018, 06:45 AM
I think my idea was much simpler, so I thought I would quickly put down the calculations, see attached workbook. My idea was to calculate the position and angle of the centroid and then track the movement in the that postion. This will move dependent on all of the brands, not just the strongest. One problem which I can forsee is that the position of the centroid is dependent on which brand is placed where on the hexagon. So I think this probably means the order of the brands needs to be sorted, possibly round the hexagon of possilby right to left. Another possiblity is to have an irregular hexagon where the angle is proportional to the price rather than the length!! or even both length and angle!!

CPerry
03-21-2018, 02:07 PM
One problem which I can forsee is that the position of the centroid is dependent on which brand is placed where on the hexagon

This was my biggest concern when visualising it, why does the nearest point's weight bring the whole thing towards the Brand being examined? Which Brand should fit here? Why should it go in a clockwise order in order of the odds? It didn't make sense to me which is why I've spent all of today pretty much looking over all of the variables required. I have accounted for every aspect now I believe and I will be completing my calculations for the first race at least by tomorrow lunchtime with all hope. I will then post my findings on here and begin calculating numbers for 2 further races so if there's any kind of hint of an edge emerging it can be tested on other races asap.

I've taken many ideas from the Hexagon example and it's really helped me picture how all of the variables weigh on each other, thank you very much for your help thus far! I really think I'm getting close to potentially finding something very useful! Anyways back to the spreadsheet, more to come tomorrow!

CPerry :)

CPerry
03-22-2018, 06:31 AM
I have just accidentally proven and found a way to express that the BACK side of Brand 1 is equal to the opposite FORCE created by the LAY side of the other 5 brands. Not sure how this helps me at the moment, further analysing now...

CPerry
03-26-2018, 01:13 PM
2192521926

So I have come up with two different theories (indicators) that take into account a bunch of different aspects from around the market. In the attached document I have created two different formulas for each of the 12 Horses. As you can see from the picture attached the black line (indicator) correctly moves ahead of the blue line (Last Traded Price) a few times for Horse 1 which I'm hoping is a positive sign. This was achieved by a really smoothed out moving average (something mad like a 150 MA).

OTL, could you please take a look to see if there are any good ways of using this indicator using some trading methods you're familiar with please? Crossover MA's or some more advanced stuff that I'm not yet familiar with.

Much appreciated and sorry it's taken a few days, these formulas are the most detailed and complex that I have created to date!
Cheers,
CPerry.

offthelip
03-26-2018, 02:37 PM
Unfortunately I couldn't open the file, I am using Excel 2007 on Win 7 and win 10.
( I never upgraded because the later version of excel never offred me anything I wanted!!)

CPerry
03-27-2018, 01:28 AM
21928

Try this one! The files are big so I've had to save it as a .csv but it is just 12 x 3 columns (Last Traded Price, Theory 1 for 12 horses, Theory 2 for 12 horses) so it should work just fine like this. If not, how is it best to send across to you?

p.s. forgot to mention in my last message but some cells shown a #NUM message because the formula takes a number and uses the ASIN function but occasionally it slips above 1 or -1??? For now, just treat any that goes over as ASIN(1) and any that go below as ASIN(-1) you can tell as the cells above and below are nearing each of the extremes.

offthelip
03-27-2018, 03:31 AM
Hi Cperry,
I have been having a look at your data, and I have added some macros and some calculations which hopefully will show you the sort of approach that I would take with this. I have added some calculations ( including a exponential moving average function) to do a moving average cross over. ( I like EMA because it is easier to change the period) I then generated a buy and sell triggers from these. I then have a subroutine (stdema) that runs down the data and calculates the percentage profit from these buy and sell triggers the results of these are displayed at the top cells AZ2 to BC2. What I then do by using the index function in columns An to AP I run through each of the horses in turn. What I then do is put a double loop around this inner loop which will vary the values of the EMA to try out different values of the two ema to see if this makes any difference. The results of these three loops is stored in a variant array which writes the results out to the worksheet "results".
I then write out some equations in the rows 2 to 7 of the results sheet to summarize the results.
I have put a filter on the results sheets ( Make sure you deselect it before running the macro otherwise the rsults are all garbage)
Hopefully this will give you some ideas about how to test the possible trading signals
One thing that I notice from the tests I have run is that Horse 1 seems to be profitable all the time but most of the other horses aren't. This would concern me because how did you select horse 1? Would this be repeated in another race?
If I had more data which covered multiple races I would add another loop round the outside looping round each race

CPerry
03-29-2018, 02:33 PM
Evening OTL!

I have had to keep away from my laptop for a few days with an eye problem but I've had a quick look at your number crunching (will properly explore the macros in the morning) but already love the idea of using EMA crossovers as I am already familiar with these!

Looking at the positives, I'm glad Brand 1 is always profitable. Now I have something concrete to work from. I think I know why that is different from other Brands also, my formulas work like a bell shape (imagine a standard deviation bell), Brand 1 is far left so all the other brands step down nicely 1/2(Brand 2)+1/3(Brand 3)+1/4(Brand 4)etc except I've used smarter calculations. Now I guess I just need to find a suitable way of placing the other brands on this bell and I should be on to a winning edge.

I've just come up with another few theories, the 3 new ones come after the two existing ones you've already looked at (Theory 3 is for Brand 1 only as it is super lengthly) at this stage could you give your opinion on which you think will be most accurate overall please? I've supplied the numbers for the first 4 brands.

This would concern me because how did you select horse 1?

I ranked the horses in order of odds (Smaller odds come first so is Brand 1 in this case, which is most often the most traded selection)

Tomorrow I will use these same 5 theory formulas on another 3 or so 12-horse race and a few 6 selection markets; maybe even an 18/19 selection market so you will have more to analyse. If you could just let me know which you think works best please I can focus on typing these up as the formulas I am using are complex and not easy to just copy and paste (results fit on a bell so have different weights and I've not figured out a clever way of having the computer work this out itself yet). As soon as I can narrow down the winning formula I can create a master app that chooses the strategy based on the number of runners (A LOADDD OF TYPING) but then run this for a whole day and have dozens of markets to analyse! Looking forward to that part already :D

Anyways, I shouldn't be spending too much time on this screen tonight so I'm going to turn off now. More tomorrow,
Cheers,
CPerry :)

21946

offthelip
03-29-2018, 03:31 PM
I had a quick look at your latest file, and the first thing I notice is that the Price data is identical to the previous file. So there is not much point in me looking any further at that.
I think your a making a big mistake in trying to develop a system looking at data from a single race. You need hundreds of races to be able to develop a reliable system. With the race data you have sent me, there are probably hundreds of different equations you can use to select horse 1 at particular times and create a winner. This is because if you look at the price of horse 1 in this race it is trending up most of the time. If you bought at the beginning and sold at the end you would be in profit. You can't expect this to happen all the time.
I know nothing about horse, but I know a little about markets ( which this is, just buying and selling bets), markets are sometimes trending and sometimes they are reverting to mean. If a market is trending and the price goes up, it is a good strategy to buy on the expectation the price continues upwards. If the price is reverting to mean and the price goes up it is a good strategy to sell on the expectation that the price will revert to mean. Any strategy that you are testing which works in a trending market should work on this race. But what happens if the next race is reverting mean?? How is your strategy going to cope? The only way you can test this is to capture a lot of data so that you have a large sample of each type of horse race, ( trending and reverting to mean and combinations of the two where it swaps mid race). It is very important to understand where your strategy will fail and what the level of losses you can expect when it does fail. Hopefully these are going to be less than the winners
So I suggest you forget the development of fantastically complicated equations and concentrate on capturing enough suitable race data to build up a data base which you can use to run tests to allow you to get statistically significant results.

SamT
03-30-2018, 06:58 AM
concentrate on capturing enough suitable race data to build up a data base which you can use to run tests to allow you to get statistically significant results.

Ditto that.

I would guess at least a ~hundred for preliminary inspiration and development, and a ~thousand for real testing.

I would also include the BookMaker's preliminary odds when computing Risk Factors... If you actually let a bet ride to the finish.

Right up til I read OTL's last, I thought your were merely betting, not buying and selling bets.

CPerry
03-31-2018, 04:53 PM
https://www.dropbox.com/s/0e22efmbdvsyp7x/Greyhound1SecLogOne.xls?dl=0

Ok I've taken your advice on board and today I have logged around 130 greyhound races. I have chosen this as there are always 6 runners so it was easier to code just one set of rules at this stage and the mini markets happen more frequently so it was easier to mine enough data in a single day to start testing my immediate theories. In the attached link I have the same 4 proposed theories as before, I was hoping you could check for which would 'perform' better using your buy/sell triggers that you already have coded please to narrow down which works best so I know which to move forward with?

Greyhounds are spaced every few minutes throughout the whole day, because of this they trade differently (thought the same efficient balancing codes are in place) from horses and football. The money can be at £0 and then up to a few thousand in two minutes which is all I've captured for each race. I'm hoping to have ultra fast 'get in' and 'get out' moments flag up to accumulate for several dogs perhaps and get a small profit over the course of each race, each day (refreshing at 20ms but captured every second for EMA cross overs and stuff).

I have also included the 'Average Weighted Price' in columns R:W which is the price weighted by the previously traded volume at each price. This means it can be used as an average to check for convergence and divergence from/to the mean for each Brand. Also included are the BACK and LAY %'s to see how unmatched money on each side affects it and I have included the whole market volume and the volume on each brand that can be used as additional indicators (Volume and Volatility) since the formulas are mainly Trend and Momentum themed.

Race details are in column A, the dogs are ranked based on their odds (Smallest first), I think that's everything for now. I will log even more races tomorrow and maybe even some football early in the week for even more markets. Hopefully that dropbox link works (as the .xls/xlsm/xlsb/xlsx files are all way too limiting).

Cheers,
CPerry :)

offthelip
04-03-2018, 12:43 AM
I am away at the moment so i will respond next week

CPerry
04-04-2018, 12:10 AM
Ok mate :) Have a good trip!

offthelip
04-07-2018, 04:22 PM
Hi Cperry,
I have just downloaded your latest spreadsheet, The data you have captured has a totally different nature to the single race that you sent before. The original "horse" race had 1792 lines of data, so using moving averages of 25 and 50 periods was quite reasonable. This new data; you have captured 130 different races each of which only has 119 or 117 lines of data ,using the same 25 or 50 period moving averages is not going to work because half the race is needed to stabilise the moving averages. This sort of data would need a different sort of approach.
the way the data is organised does not make it at all easy to analyse, any moving average has to be reset when ever a race starts.
Also you have quite happily calculating your Theory 1 to 4 values, which is all very well. what you have really got to do is calculate some buy and sell triggers from these values. I gave you the software for a doing a very basic buy and sell test when I posted my analysis of the last race.
I think that you might find that the data you capture for each type of race ( greyhounds, Horses, Football teams , pigeons?) is going to be very different, so you are going to need to use a different analysis technique for each type of race and also quite likely a different strategy.
You need to look at the VBA macros that I posted in post #28, so that you can modify them to do your own testing of your theories. Once you have grasped how they work you can then get your data capture and organisation to fit in with the strategy testing.

There are a number of separate stages that you need to cover:
1: data capture ( done as far as I can see)
2: data organisation to make it easy to run a standard test template
3: Develop indicators for testing ( done)
4: create triggers from the indiactors
5: run standard template (done by offthelip)
6: loop through standard template 100+ times using different race data
7: Capture the results of the testing (done by offthe lip)
8: Analyse the results of the testing
9: Improve triggers to hopefully improve results,
10: go round the process again from item 3

CPerry
04-11-2018, 04:16 AM
Hi OTL :)

Instead of just calculating the final two minutes of every race running (So I get the most amount of data as fast as possible) I have captured 5 minutes worth of data for 93 races so far (I will type up the ’theory’ formulas tonight and attach a new document before I go to bed) so there is enough data to use for each race. This new data should be much more like the previous data which worked with the EMA crossovers. 5 minutes out most races only have a trickle of money coming in so by the time the race hits a target amount i.e. £2,000 the EMA readings should already have an emerging ‘image’ of the market. I think this will work better. Each of the 93 races just needs to be broken up so the moving averages don’t have to be reset and then it can be analysed from there, I will properly take a look at how to do this tomorrow along with your previous software you sent across recently.

With regard to buy/sell or back/lay triggers the formulas will simply be something along the lines of backing when the line crosses above a specific value much like a RSI but I will need to visually see these numbers before I make a decision on what values to use. The ‘theory’ readings are all % ranging from 100%-100% so the most accurate theory will need to be tested somehow?

So that accounts for 2 and 4 on your list, when I can properly have a look at the results tomorrow I will work on steps 6:10!

Message again tonight :)
CPerry.