In this last of a series of posts on creating estimates for teams' chances of winning portions of an AFL game I'll be comparing a statistical model of the Home Team's probability of winning 0, 1, 2, 3 or all 4 quarters with the heuristically-derived model used in the most-recent post.
That model, which we saw fitted the available data quite well, was based on the following simplifying assumptions:
- that a team's probability of winning a quarter (p) was fixed for each quarter of a game
- that the TAB Bookmaker's pre-game prices could be used to derive a well-calibrated estimate of either team's victory probability (V)
- that a team winning 0 or 1 quarters would lose every game, that a team winning 3 or 4 quarters would win every game, and that a team winning exactly 2 quarters would win half the time and lose half the time. This last assumption allowed us to express V as a function of p and, by empirical means, to then express p as a function of V and therefore estimate the probability that a team would win 0, 1, 2, 3 or 4 quarters.
In developing this heuristic approach I ignored drawn quarters; I've done the same in creating the statistical model.
A MULTINOMIAL LOGIT
The statistical model I've enlisted for the task is called a multinomial logit, which I fitted using the multinom function from the nnet package of R. The target variable was the number of quarters won by the Home team - which takes on only the values 0, 1, 2, 3 or 4 since I've ignored games with drawn quarters - and the lone regressor was the Home team's Implicit Probability, derived from the TAB Bookmaker's pre-game head-to-head prices. The data set was all games from seasons 2006 to 2012.
This modelling approach is quite parsimonious, requiring the estimation of just 8 parameters - 4 intercepts and 4 coefficients on the Implicit Probability variable. It provides the following fitted probabilities:
The red line shows the fitted probability of the Home team winning 0 quarters as its Implicit Probability of victory ranges from 0% to 100%. The olive green line provides the same probability trajectory for the Home team winning 1 quarter, the lime green line for 2 quarters, the blue line for 3 quarters, and the dark pink line for all 4 quarters.
Presented in this way, the chart highlights a number of features of the underlying statistical model:
- The most likely outcome for the Home team is that it will win 1 quarter if its Implicit Probability is less than about 28%, 2 quarters if it's less than about 60%, 3 quarters if it's less than about 91%, and all 4 quarters otherwise
- No matter what the Home team's Implicit Probability it is never most likely to lose all 4 quarters
- Even with an Implicit Probability as high as 75%, a Home team is more likely to lose 1 or even 2 quarters than it is to win all 4 quarters
- As long as the Home team's Implicit Probability is 50% or higher, losing all 4 quarters is the least likely outcome for the Home team
One of the advantages that the statistical model has over the heuristic model is that it can accommodate the slight Home-team bias in the TAB Bookmaker prices. You can see that the statistical model has done this by noting that when the Home team's Implicit Probability is exactly 50%, the model estimates the Home team as being more likely to win 3 quarters than to win just 1.
COMPARISON WITH THE HEURISTIC MODEL
If you refer back to the earlier blog you'll see that the heuristic model fitted least well for extreme probabilities - specifically, where the Home team was the heavy underdog. The statistical model does not exhibit this behaviour to the same extent, if at all.
Actually, the statistical model fits admirably across the entire range of Implicit Probabilities for the Home team, the maximum difference between the expected and actual average number of quarters won never straying outside the range (-0.2, +0.2).
Aggregating across all 1,329 games the statistical model's fit is very good, as evidenced by a comparison of the totals at the foot of the Expected and Actual sections of the table.
It's apparent then, from visual inspection of this table, that the statistical model fits the actual data better than the heuristic model.
A crude way of quantifying the superiority is to determine the relative accuracy of the two models by calculating how often each model correctly predicts the number of quarters won by the Home team. The heuristic model does this for 448 of the 1,216 games (36.8%) in which no quarters were drawn; the statistical model does this for 468 of the 1,216 games (38.5%).
PRICE-SETTING WITH THE STATISTICAL MODEL
Given its demonstrated superiority, we might prefer to use the statistical model for framing a market on the number of quarters the Home team will win. Note that, unlike the heuristic model, the statistical model provides different probability predictions for Home teams at a given pre-game price compared with Away teams at the same price. Consequently this table is longer than the equivalent table we had in the previous blog.
You can gauge the extent to which the statistical model produces different estimates for Home teams and Away teams with the same Implicit Probabilities by comparing a row from towards the top of the table with the equivalent row from towards the bottom with the prices reversed.
For example, consider the third row of the table, which applies when the Home team is priced at $10.00 and the Away team at $1.05. For such a game the statistical model has the Home team probability of losing all 4 quarters at just under 25%.
In the matching game from the lower section of the table, where the Home team is at $1.05 and the Away team is at $10.00, the statistical model has the Home team probability of winning all 4 quarters at over 36%.
The heuristic model would have the Home team as 43% chances to lose all four quarters in the first example, and as 43% chances to win all four quarters in the second.
This example highlights another difference between the statistical and the heuristic models, namely the latter's tendency to be more conservative in its estimates of more extreme results, that is those where a team wins none or wins all 4 quarters.
Even a $15 Home favourite is only rated by the statistical model as about a 40% chance to win all 4 quarters. The heuristic model would rate such a team as better then even money chances to sweep the quarters.
Though the heuristic model is appealing in the simplicity of its derivation, if I were taking on the bookmakers in attempting to predict the number of quarters that a team was most likely to win, I think I'd be using the statistical model.