The Predictability of 2013

July 16, 2013 Tony Corke

Friend of MAFL, Michael, e-mailed me earlier to ask about my claim that 2013 was on track to be the most predictable MAFL season ever, pointing out, quite correctly, that bookmaker favourites have been winning at about the same rate - perhaps even at a slightly higher rate - as they had been at the same time last year.

That led me to clarify my own thinking - never a bad thing - about my use of the term "predictability", and in particular to recognise a couple of things about that notion:

It makes little sense for me to talk about "predictability" in the abstract. What I should be saying is that a specific aspect of the season has been especially predictable - say the winners of games, or the margins of their victory. It's entirely possible that one aspect can have been more predictable in Season A than in Season B, while another aspect had been more predictable in Season B than in Season A.
Predictability is in the eye of the predictor. An event that might be very unpredictable to me could be much more predictable to someone with access to better information or to a better model for translating that information into predictions. Unless an event is truly and completely unpredictable - say, the timing of the decay of a radioactive particle - predictors can be ranked in terms of their predictive abilities.

A FEW PREDICTIVE MODELS

To illustrate these issues, and to quantify the relative predictability of specific game outcomes (who wins and by how much) across recent seasons, I'm going to build three predictive models, each fitted to actual game margins over the period 2006 to 2013.

The first, the Bookmaker-Only model, uses as its only input the TAB Bookmaker's head-to-head prices. Specifically, it uses a transformation of the Risk-Equalising Implicit home team Probability, the transformation serving to convert this Implicit probability into a notional margin prediction by treating it as a cumulative density coming from a Normal distribution with mean 0 and a standard deviation of about 35.2 points per game. I'll come back to this transformation step a little later.
The second, the MARS-Only model, uses as its only input, the pre-game MARS Ratings of the two teams.
The third, the Enhanced-MARS model, is the same as the second, except I add Interstate Status as an input.

Because of the potentially distorting impacts of large game margins, I've decided to use robust methods for fitting these three models. In particular, I've used rlm (Robust Linear Regression) from the MASS package in R.

The first fitted model is:

Bookmaker-Only Model: Game Margin = 2.0824 + Transformed Bookmaker Implicit Probability

That co-efficient of 1 on Transformed Bookmaker Implicit Probability is no accident. Before deciding on the particular transformation to use, I fitted models using the Implicit Probability untransformed and I fitted models using a number of other transformations. Eventually, recollecting previous analyses where I'd shown empirically that handicap-adjusted margins were Normally distributed I thought I'd try something similar, which is to say that I treated the Implicit probability as a cumulative density from a Normal distribution with zero mean and then calculated the inverse. The quality of the model seemed to be impervious to my choice of standard deviation for this Normal distribution, so I selected a standard deviation to make the coefficient in the model equal to 1.

Let me make that inverse process a little more tangible by way of example:

Consider an actual game from 2006 where the head-to-head prices are $1.48 for the home team, and $2.50 for the away team
These give an Implicit Risk-Equalising probability for the home team of 1/1.48 - (1/1.48 + 1/2.50 - 1)/2, or 0.638
If we assume that's a cumulative probability from a Normal distribution with mean 0 and standard deviation 35.1575, it equates to a predicted margin of about 12.4 points. (As one form of confirmation of the sanity of this procedure, the handicap in that actual game was -12.5 points.)

So, what the first model gives us is a simple way of converting transformed Implicit Probabilities to margin forecasts: we just apply the inverse transformation to the Implicit Probability that I've just described and add about 2.1 points to the result. (The fact that the intercept is not zero reflects the bias, already chronicled in MAFL, in the TAB Bookmaker's probability assessments if we adopt the Risk-Equalising approach. It reflects the likely victory margin of a team rated a 50% chance on the TAB.)

We can use the predictions from this model in two ways to estimate different aspects of game predictability.

We can select a winning team on the basis of the sign of the fitted value. Positive values imply a home team win, negative values an away team win. The accuracy of predictions made in this way give us a measure of the predictabilty of game winners.
We can measure the standard deviation of the residuals of the fitted model, which give us a measure of the unpredictability of the actual game margin.

If we do this, separately, for each season, we can estimate the relative predictability of each season.

The results for the model we've been discussing so far are those shown in the column headed "Bookmaker Model". They show that this model has had lesser predictive accuracy this year compared to the two previous years' results, though all three years have been, in historical terms, very predictable. (I have, by the way, treated draws as incorrect predictions.)

They also show, however, that the standard error of the residuals for this model is at an historical low this season, and over 4 points per game smaller than for last season. In other words - and somewhat loosely speaking, statistically - the average amount by which the actual game margin differed from the fitted value provided by the model has been about 4 points smaller this season compared to last.

This result is consistent with the much smaller mean absolute prediction errors we're seeing from the MAFL Margin Predictors this season compared to last season.

In summary, if you're measuring predictability through the eyes of someone who's using the fitted model described here, you can say that:

The winners of games this season have been less predictable this year than last
The game margins this season have been more predictable this year than in any other since at least 2006

The other two columns in the tables show that the same can be said of anyone using either of the two other models I fitted, which, for completeness' sake, are:

MARS-Only Model: Game Margin = 86.7798 + 0.7310 x Home MARS Rating - 0.8102 x Away MARS Rating

Enhanced MARS Model: 72.8504 + 0.7402 x Home MARS Rating - 0.8103 x Away MARS Rating + 8.6383 x Interstate Status

Notice, however, that while all three models agree that this season's winners have been less predictable and this season's margins have been more predictable than last season's, they disagree in terms of the absolute levels of predictability. Indeed, this season the MARS-Only Model has found game margins more predictable than has the Bookmaker-Only model, while the opposite was true last season.

It's entirely possible that a reasonable (not obviously over-fitted) model could be devised whose fitted values imply that this season is, in fact, less predictable than last season, both in terms of who's won and by how much - but that's unlikely, I'd suggest, at least using the inputs I've used here (counter-examples welcomed).

If we believe, however, that the TAB bookmaker's prices incorporate all available knowledge and that the transformation I've applied here is about as good as any way possible of expressing that knowledge in terms of game margins, then most if not all of the unpredictability that remains in game margins is potentially not predictable by any means. That is, what remains is randomness. If that's true then maybe we can legitimately claim in some absolute sense that last season's game margins were less inherently predictable than this season's.

Everything, of course, is far more predictable in hindsight, but what statistical modellers strive for is models that make the future, not just the past, more predictable. That's why I'm still doing this for AFL after 7 years.