Having a new - and, it seems, generally superior - way to calculate Bookmaker Implicit Probabilities is like having a new toy to play with. Most recently I've been using it to create a family of simple Margin Predictors, each optimised in a different way.
RECOGNISING FAMILY MEMBERS
By combining the log odds, formed as the log of the Home team Implicit Probability divided by the Away team Implicit Probability, with a MARS Ratings ratio, formed as the ratio of the Home team's and the Away team's MARS Ratings, I'm able to construct a family of Margin Predictors of the form:
Predicted Margin (M) = k x ln(Odds^lambda 1) x MR^lambda 2
where Odds is Home team probability/Away team probability, MR is Home team MARS Rating/Away Team MARS Rating, and lambda 1 and lambda 2 are both non-negative.
One way to think about this functional form is that it allows the bookmaker probabilities to determine the sign of the prediction via the log Odds term, and the other components to stretch the log odds term out. Larger values of lambda 1 make predicted margins more extreme as probabilities vary from 50%, and larger values of lambda 2 make predicted margins more extreme as the ratio of the teams' MARS Ratings differs more from 1.
It's a fairly simple functional form - it involves only 3 parameters and requires only knowledge of the Bookmaker's market prices and the MARS Ratings of the two teams - but it's flexible enough to be optimisable for a range of purposes, as we'll see.
The investigation of the functional form just described came as a result of some idle, mostly undirected, have-something-running-in-the-background-while-I-do-something else tinkering I was doing with Eureqa, seeing whether it preferred to use Implicit Probabilities derived via the Risk-Equalising or the Overround-Equalising approach when trying to model game margins. I included probabilities from both approaches as well as a number of the standard ratios that can be formed using them, such as log odds. I also tossed in MARS Ratings, since it's always fun to see if they can elbow out bookmaker-derived variables in a model, and then came up with the idea of creating ratios with the Ratings too.
Eureqa's current default error metric is the Absolute Error [AIC] metric, which favours simple solutions with integer parameters. In one run it spat out "21 x log odds" as a predictor, which turns out to be a surprisingly good one with an MAPE across seasons 2007 to 2012 of just 29.22 points per game.
It also spat out other solutions of the form k times log odds times the MARS Rating ratio raised to some integer power, which also proved to be competent margin predictors, as shown in the following table.
Model A is the simplest model of all and is:
Predicted Margin = 21 x log odds ratio
This is the model I mentioned earlier with the 29.22 all-season MAPE. In the table I've also shown the model's accuracy in terms of predicting line betting outcomes - in other words, of predicting whether or not the favourite will cover the handicap it's been given by the bookmaker. Model A correctly predicts 51.7% of all line betting outcomes, a rate that it much better than chance (ie 50%) but not high enough to make it profitable if wagering in the line market at $1.90. It shows some variability in line-betting performance, registering a low of 49% in its worst season and a high of 57.5% in its best.
Model B is the simplest model to incorporate the MARS Rating Ratio and is clearly superior to Model A. Its MAPE is 0.12 points per game lower and its line betting accuracy is 2.3% points higher, although its line betting accuracy does show greater variability ranging from a low of 48.8% to a high of 60.8%.
Model C is similar to Model A in that it does not include the MARS Rating Ratio, but it allows k, the overall multiplier, to be non-integer. It outperformed Model A as measured by Eureqa's error metric on the holdout sample that Eureqa created for the purpose, but that superiority is not reflected in the results above, which are for the entire set of games across the 2007 to 2012 season and not just those in the Eureqa holdout sample.
Next is Model D, which is the same as Model B but with a non-integer k. Model D proved superior to Model B using Eureqa's metric on a holdout sample but, as in the cases of Model A and C above, the superiority of Model D is not borne out in my summary table which is based on all games.
The last of the models built using Eureqa is Model E, which sports a non-integer k and which squares the MARS Rating Ratio, thereby exacerbating the impact on the predicted margin of any mismatch in team strengths as reflected in the MARS Ratings. It's the best model we've seen so far and produces an MAPE of 29.09 points per game and a line betting accuracy of 54.9% across all six seasons. It selects line betting winners at levels better than chance in every season.
USING EXCEL'S SOLVER FUNCTION
It was only a few years ago that I discovered Excel's Solver functionality, which has been significantly improved in Excel 2010.
I used it to optimally select k, lambda 1 and lambda 2 for a number of models, defining optimality with a different metric for each. Excel offers three different optimisation algorithms, two of which I used for this blog:
- The default, GRG Nonlinear engine, which I used when the metric to be optimised related to MAPE. This algorithm works well on such problems because the metric to be optimised tends to move smoothly as we traverse within the same neighbourhood of the parameter space.
- An Evolutionary algorithm, which I used when the metric to be optimised related to line betting accuracy because the use of such metrics, in contrast with those using MAPE, tended to produce very "non-smooth" problems - that is, where small changes in the parameters lead to large and discontinuous changes in the metric being optimised and the direction of the change in the metric can reverse within a small range of values for a single parameter.
In using either algorithm I left any tuning parameters at Excel's default settings.
Lastly, I set upper and lower bounds for all of the parameters - a requirement in using the Evolutionary algorithm but not in using GRG Nonlinear - with k constrained to lie in the interval (0.01, 100), and the two lambdas constrained to lie in the interval (0.1, 25).
MODELS PRODUCED WITH THE SOLVER
By choosing to optimise across all games in the 2007 to 2012 period, I do run the risk of overfitting. If I had my time over again, I would probably have set up random test and training samples and sought to shield myself a little from this risk. But, no matter - it'll be interesting to see at the end of the 2013 season how these models have fared and which have shown signs of being overfitted. The potential for such overfitting should be borne in mind in interpreting the results provided here.
To the models then, as overfitted as they might be ...
Models that Optimise All-Season Performance
Model F was constructed to minimise the all-season MAPE, which eventually came in at 29.08 points per game, just a smidge below Model E's. Model F is a slightly poorer line betting tipster than Model E, but its 54.5% is still very creditable, and its best single-season result of 65.6% is the equal of any model constructed for this blog.
It's interesting to note how, given a search over a much wider and more granular parameter space, the optimal parameters for Model F turned out to be very similar to those for Model E. As one consequence of this, the correlation between the margin predictions of the two models turns out to be +0.9995.
Model G is a model constructed with more obvious wagering intent. Its parameter values were chosen to optimise its line betting accuracy across the six seasons, which is best achieved by selecting parameter values that tend to exacerbate differences in team strengths. This is manifested, for example, in the standard deviation of the predicted margins which, at over 51 points per game, is almost double the standard deviation of the predicted margins of any of the models so far considered.
This increase in the variance of the predicted margins, while optimising the model's overall line betting accuracy, comes at a fairly steep cost in terms of MAPE: Model G's MAPE is 5.50 points per game, or almost 20%, worse than Model F's.
The process of finding Model G made me very aware of the optimisation challenge posed whenever line betting accuracy is the metric. My initial solution for Model G was very different, and inferior, to the one you see here. It wasn't until I was building Model M (discussed below) that I discovered a solution with an all-season line betting accuracy of 57.0% was possible. That made me re-optimise Model G with different starting parameters, which produced the model you see now, which is very different from Model M and from the original solution I had for Model G.
Now in terms of overall line betting accuracy, Model M is the equal of Model G and could stand in its stead, but Model M has a value of k that's about one-third of Model G's, a value of lambda 1 that's almost three times as large, and a value of lambda 2 that's about 50% higher. These two very distant points in paramater space produce identical all-season line betting performances.
Optimising all-season performance can result in predictors that display undesirable worst-season performances. Empirically, this seems to be more of an issue when optimising for line betting accuracy, however, than when optimising MAPE. As evidence for its being less of an issue when optimising MAPE, I note that Model F, the model built to minimise overall MAPE, also has a worst-season MAPE that's superior to every model considered in this blog except Model H.
Models that Optimise Worst-Season Performance
The reason Model H is superior on this metric is because that is the basis for its optimisation: its parameters were selected to produce a predictor with the best-possible worst-season MAPE. That worst-season performance is 29.38 points per game, which is only 0.18 points per game worse than its all-season MAPE of 29.20. It's also a reasonably competent line betting predictor, correctly selecting over 54% of winners, with single-season results ranging from 49.7% to 58.1%
Model I is the equivalent of Model H but with parameters chosen to maximise worst-case single-season line betting accuracy. It's a very strong, very consistent line betting predictor, with a single-season low of 54.6% and an all-season average of 55.7%. Note though that minimising the downside comes at the price of minimising the upside too: in its best season, Model I correctly predicts only 59.1% of games, a rate that is considerably lower than the best-season results for most of the models already discussed, even those optimised for MAPE rather than for line betting.
Minimising downside line market accuracy also comes at a significant cost in terms of MAPE, which for Model I is 38.67 points per game across all seasons, and 32.79 points per game even for its best season.
Where Models H and I have been designed to minimise worst-case performance, Models J and K have been designed to maximise best-case performance. This is not an approach I'd recommend for general use if you're planning to employ the resulting model in any practical setting because it's an approach virtually predestined to overfit. When you allow a model to be constructed on the "There Was a Little Girl ..." principle, there's a risk that when it's good it's very, very good, and when it's bad, it's horrid.
In our situation here, however, this risk doesn't really materialise.
Models that Optimise Best-Season Performance
Model J, which maximises the best-case single-season MAPE, manages to produce a sublime 2009 season with an MAPE of 27.54 but still returns an all-season MAPE of 29.61 per game, which, though far from optimal, isn't embarrassing. Similarly, Model K (also Model F, by the way), which maximises the best-case single-season line betting accuracy, cranks out a 65.6% line betting season in 2010 yet still manages a worst-season performance that's better than chance at 50.8%.
A Model to Optimise Worst Single-Game Performance
To motivate Model L imagine the situation where your loss was related to the size of your largest absolute prediction error in any single game.
Model L's parameters were chosen to minimise this maximum error. Though not shown in this table, the maximum APE for any single game was 136.13 points, which was the APE for Model L for the Round 6 Richmond versus Geelong clash where the Cats handed out a 222-65 thrashing. In that game, the Tigers were receiving only 13.5 points start, so the Model can, I think, be forgiven for finding it hard to do any more than predict a Cats win by 21 points.
Aside from minimising one's maximum blushes, there's little else that this model is good for, as its all-season MAPE is 34.19 points per game, and its all-season line betting accuracy is just 50.7%.
Models to Optimise All-Season and Worst-Season Performance From Round 6 Onwards
The final two models were also constructed with a wagering intent and with an acknowledgement that MARS Ratings are not always at their predictive best in the early portions of seasons. Accordingly, these two models seek to optimise only for games played after Round 5 in any of the seasons.
Model M is optimised for overall line betting accuracy across all six seasons and correctly predicts 57.4% of games from Round 6 onwards and, as noted earlier, matches Model G in terms of line betting accuracy across all games, including those from the first 5 rounds of a season. Its worst-case single-season performance (for games played in Round 6 or thereafter) is only 51.9% though, so it would not have been profitable in every season.
Model N maximises the worst-case single-season line betting accuracy for games played in Round 6 or after. That worst-case performance is a profit-making 55.1%, which is only 1.3% points below Model M's all-season after-Round-5 performance.
USE IN THE WILD
If I were to recommend the serious use of any of these predictors in season 2013, I'd opt for:
- Model E or Model F if my goal was to have a model with the best chance of producing a competitive MAPE for the season. Between the two, while Model F has a slightly superior historical MAPE, it is more complex in its parameter choice (assuming you share the intuition that integer parameters are in some sense "simpler" than non-integer ones, though quite why football results should care about this is a challenging argument to prosecute). So I'd probably select Model E ahead of Model F. Model E also has a superior all-season line betting performance, and so might be considered as a more general-purpose algorithm.
- Model N if my intent was to wager in the line betting market, especially if I could restrain myself from wagering in the first 5 rounds of the season. There's a very good chance though that Model N is chronically overfitted. The other candidate model would be Model I, which is modestly superior to Model N in terms of its worst-case single-season performance, but looks no less overfitted than Model N and has an all-season line betting record that's modestly inferior. So, I wouldn't recommend switching to it, even if the intent was to wager on line betting in every round of the season.
In summary then: Model E for margin prediction and Model N for line wagering (with absolutely no promises).