In-Running Models: Confidence Intervals for Probability Estimates
/In a previous blog on the in-running models I generated point estimates for the Home team's victory probability at different stages in the game under a variety of different lead scenarios. In this blog I'll review the level of confidence we should have in some of those forecasts. More formally, I'll generate 95% confidence intervals for some of those point forecasts.
RE-EXPRESSING THE IN-RUNNING MODELS
Before I move on to generating confidence intervals I'm going to present the in-running models in a slightly modified - but equivalent - form:
- logit(Prob Home Team Wins at end of Q1) = 0.083 + 0.055 Home Team at End of Q1 + 0.420 Pre-Game Log Odds for Home Team
- logit(Prob Home Team Wins at end of Q2) = 0.066 + 0.072 Home Team at End of Q1 + 0.094 Change in Home Team Lead from End of Q1 to End of Q2 + 0.358 Pre-Game Log Odds for Home Team
- logit(Prob Home Team Wins at end of Q3) = 0.013 + 0.110 Home Team at End of Q1 + 0.130 Change in Home Team Lead from End of Q1 to End of Q2 + 0.108 Change in Home Team Lead from End of Q2 to End of Q3 + 0.283 Pre-Game Log Odds for Home Team
All I've done in re-estimating the models in this way is express the coefficients on Home Team leads at the end of Q2 and Q3 in terms of differences from the lead at the end of Q1 rather than as absolute leads at the end of the relevant quarter. This, I think, aids in the interpretation of the results that follow.
It also provides a way of estimating the relative importance of the change in the Home team lead from one quarter to the next, rather than, as I presented in the earlier blog, the relative importance of the absolute lead as at the end of the relevant quarter. Put another way, in this new formulation we can assess the individual contribution of the Home team's outscoring of the Away team in any single quarter.
The first model here is the same as in the earlier blog - since the change in the lead is exactly equal to the lead at the end of Q1 - and the variable importances remain 52% for the log odds variable and 47% for the Home Team Lead at the end of Q1.
For the second model we have the following variable importances: log odds (25%); Home team lead at end of Q1 (29%); Change in Home team Lead Between End of Q2 and End of Q1 (46%).
For the third model we have the following variable importances: log odds (14%); Home team lead at end of Q1 (23%); Change in Home team Lead Between End of Q2 and End of Q1 (36%); Change in Home team Lead Between End of Q3 and End of Q2 (27%).
It's interesting to note that, from the point of view of trying to predict the outcome at Three Quarter time, knowledge of what happened to the Home team's lead during the course of the 2nd term is more valuable than knowing what happened to it during the 3rd term.
Note that the correlation between the changes in leads from one quarter to the next, while positive, are quite low. The correlation between the Home team lead at Quarter time and the change in the lead between Quarter time and Half time is +0.15; between the change in the lead between Quarter time and Half time, and the change in the lead between Half time and Three Quarter time is +0.16; and between the change in the lead between Half time and Three Quarter time, and the change in the lead between Three Quarter time and Full time is +0.17. All of which simply suggests that the size of the lead change in one quarter tends to be similar to the size of the lead change in the preceding quarter, but that this tendency is weak.
Confidence Intervals for Predictions of the Home Team Victory Probability at the end of Q1
For this first chart I've provided the upper and lower bounds on the 95% confidence interval for Home team victory probabilities for Home teams with pre-game log odds of +0.90 (in purple) and -1.94 (in red) as we vary the Home team's Quarter time lead.
So, for example, a Home team that finds itself tied at Quarter time and that had a pre-game log odds ratio of +0.90 has, with a 95% confidence level, an estimated victory probability of between about 57% and 62%. A Home team in similar circumstances at the first change but that sported a pre-game log odds ratio of -1.94, making it about an $8.00 underdog, has instead at a 95% confidence level an estimated victory probability of between about 30% and 36%.
Clearly, at this point in the game, the Bookmaker's pre-game assessment remains very relevant - even with a 30-point Quarter time lead, a Home team with pre-game log odds of -1.94 still has a lower bound on its victory probability of only about 63%.
We could, of course, produce similar confidence intervals for Home teams with other log odds ratios but those shown here provide practical extremes for what we'd produce.
Confidence Intervals for Predictions of the Home Team Victory Probability at the end of Q2
Using the second model requires us to make an assumption about the Home team's lead at Quarter time, as well as its pre-game log odds ratio and the change in its lead between the end of Q1 and the end of Q2. For the first set of comparisons I'll assume that the Home team trailed by 2 goals at Quarter time. Again I'll look at the situation for Home teams with pre-game log odds of +0.90 and -1.94.
Note that the x-axis is now the change in the Home team lead, relative to the lead that it had at the end of Q1 (ie -12 points). So, a Home team that trailed at Quarter time by 2 goals and that still trailed by 2 goals at Half time (ie had a 0 points change in that lead) would have a 95% confidence interval for its victory probability of about 32% to 42% if its pre-game log odds were +0.90 and of about 13% to 24% if its pre-game log odds were -1.94.
The lower bound of these confidence intervals reaches 50% for a Home team with a log odds ratio of +0.90 at a lead change of about 8 points (leaving it trailing by 4 points overall). A Home team with a log odds ratio of -1.94 needs a lead change of about 20 points for the lower bound of its probability forecast interval to reach 50%, meaning that it needs to lead by about 8 points overall at Half time.
Next we'll consider Home teams with pre-game log odds of +0.90 and look at the 95% confidence interval for their victory forecasts when they lead by 2 goals at Quarter time (purple) compared to when they trail by 2 goals at Quarter time (red).
Remember that the x-axis is now the change in the Home team lead, so the zero point corresponds to the Home team leading at Half time by two goals in the case of the purple lines, and the Home team trailing at Half time by two goals in the case of the red lines.
One feature to note about the confidence intervals is the consistency of their width - about 10 percentage points - across the majority of the range of the x-axis. The exceptions are for large negative lead changes for Home teams that were already trailing at Quarter time (the red lines) and for large positive lead changes for Home teams that were already leading at Quarter time (the purple lines).
The following chart provides the same information but for a Home team with pre-game log odds of -1.94.
We can see from this chart that, as you might expect, pre-game Home team underdogs are assessed as being about 15 to 25 percentage points less likely to win than pre-game Home team underdogs in the same game situation. As well, the 95% confidence intervals for pre-game favourites and underdogs are of generally comparable width.
Confidence Intervals for Predictions of the Home Team Victory Probability at the end of Q3
Finally, we move to the third of the models and consider a range of scenarios for Home team log odds ratios and leads at the end of Q1, Q2 and Q3.
On the left we look at Home teams with pre-game log odds of +0.90 (purple) and -1.94 (red) that trailed at Quarter time and Half time by 2 goals; in the middle we look at teams with the same pre-game log odds ratios but that were tied at Quarter time and Half time; and on the right we look at teams with the same pre-game log odds ratios but that led by 2 goals at Quarter time and Half time.
To me, the striking features of these charts - and of those we've already reviewed - is how narrow, relatively speaking, the confidence intervals generally are. For the most part they're plus or minus 5 to 7 percentage points. What would be fascinating to know is how often the odds being offered by the bookmakers that allow in-running wagering lie outside these relatively narrow bands. Maybe this is an area to investigate in season 2013 ...