Projecting the Final Aggregate Score of an AFL Game In-Running
/It's quarter time, you've an Unders bet with a 180.5 threshold (ie you're betting that the final aggregate score will be 180 points or fewer) and you've just seen 40 points kicked in the 1st Quarter. How comfortable should you feel with you wager?
I've been facing these types of questions quite a lot this season now that I'm making Unders and Overs bets, so I decided to build a few simple statistical models to give me an insight into how accurately the final total score might be projected at the end of each quarter (well, not the end of the 4th Quarter - I'm pretty sure that's a projection fairly easy to make even without a statistical model).
For the purposes of modelling I've chosen to use the data from all of the games played between 2000 and 2015 and, in the end, I've settled for ordinary least squares regression as my model choice. Thinking that non-linearities might be important, I also had a look at random forests, but they didn't seem to offer much better fit and they're far more complicated to pop the hood on.
Three models were fitted, the first allowing projection of the final total score based only on the score at the end of the 1st Quarter, the second allowing projection at the end of the 2nd Quarter, and the third (I'm guessing you've spotted the pattern by now, but we've come this far, so let's tap in that last nail) allowing projection at the end of the 3rd Quarter.
Details of each of those models appear in the table at right.
The End Q1 Model is the one we can use at the end of the 1st Quarter and it tells us that our best estimate of the final total score is equal to 119.55 points plus 1.47 time the points scored in the 1st Quarter. So, every extra point scored in the 1st Quarter adds a little less than half a point to the final expected total score.
As we'd expect, there's considerable variability around our fitted total scores; we explain only about one-third of the variability of those total scores across our sample data.
One way of practically quantifying our uncertainty is to look at the model's residuals - the difference between the Actual Total and the model's fitted total. We see from the bottom set of numbers that, for example, 1% of the residuals for the games in our sample were -61.13 points or smaller.
We could form, for example, a 50% confidence interval around our estimated final score by using the 25% and 75% quantiles shown here. So, taking the example that I gave at the start of this blog, our 50% confidence interval for the final total score would be {119.55 + 40 x 1.47 - 19.08, 119.55 + 40 x 1.47 + 18.29}, which is {159.2, 196.6}. That contains the 180 point threshold so we've no reason to be feeling particularly confident yet.
If we wanted a rough estimate of the probability of the final score remaining under 180, we could note that 180 - (119.55 + 40 x 1.47) is 1.69, which is fractionally above the 50% quantile for the model residuals, which is -0.40. On that basis, there's a slightly better than 50% chance the final total will be below 180. Small comfort only for us then.
The two other models work in similar ways and you can see that we, progressively, increase the proportion of explained variance and narrow the size of our empirical confidence intervals. Our 50% confidence intervals, for example, shrink from the 37.4 points we had at the end of Quarter 1 to about 29.3 points at the end of Quarter 2, and 19.5 points at the end of Quarter 3.
We can also see the improvement in the accuracy of the fitted total scores when we plot them, for each of the three models, against the actual scores.
To be clear, these are very simple models, and we might be able to improve on their estimates by incorporating more input variables, for example and most obviously the pre-game market expectations of the total score. This is something I might seek to do in a future blog post if I can source sufficient pre-game Unders/Overs market data.
For now though, it's interesting to recognise that we can only expect to project the final total score:
- from the quarter time score within about 6 goals 50% of the time
- from the half time score within about 5 goals 50% of the time
- from the three-quarter time score within about 3 goals 50% of the time