Injecting Variability into Season Projections: How Much is Too Much?

I've been projecting final ladders during AFL seasons for at least five years now, where I take the current ladder and project the remainder of the season thousands of times to make inferences about which teams might finish where (here, for example, is a projection from last year). During that time, more than once I've wondered about whether the projections have incorporated sufficient variability - whether the results have been overly-optimistic for strong teams and unduly pessimistic for weak teams.

Read More

An In-Running Model for the Total Score of an AFL Game

A few weeks ago, I wrote a piece describing the construction of an in-running model for the final margin of an AFL game. Today, I'm going to use the same data set (viz, score progression data from the www.afltables.com website, covering every score in every AFL game from 2008 to 2016) to construct a different in-running model, this one to project the final total score.

Read More

Selected AFL Twitter Networks: Graph Theory and Footy

Only a few times in my professional career as a data scientist have I had the opportunity to use mathematical graph theory, but the technique has long fascinated me.

Briefly, the theory involves "nodes", which are entities like books, teams or streets, and "vertices", which signify relationships between the nodes - such as, in the books example, having the same author. Vertices can denote present/absent relationships such as friendship, or they can denote cardinality such as the number of times a pair of teams have played. Where the relationships between nodes is between them and not from one to the other (eg friendships), the vertices are said to be undirected; where they flow from one node to another they're said to be directed (eg Team A defeated Team B).

Read More

The Case of the Missing Margins (Are 12 to 24-point Margins Too Rare?)

The analysis used in this blog was originally created as part of a Twitter conversation about the ability of good teams to "win the close ones" (a topic we have investigated before here on MoS - for example in this post and in this one). As a first step in investigating that question, I thought it would be useful to create a cross-tab of historical V/AFL results based on the final margin in each game and the level of pre-game favouritism.

Read More

Team Rating Revisited: A Rival for MoSSBODS

Last year, predictions based on the MoSSBODS Team Rating System proved themselves to be, in Aussie parlance, "fairly useful". MoSSBODS correctly predicted 73% of the winning teams, recorded a mean absolute error (MAE) of 30.2 points per game, its opinions guiding the Combined Portfolio to a modest profit for the year. If it had a major weakness, it was in its head-to-head probability assessments, which, whilst well-calibrated in the early part of the season, were at best unhelpful from about Round 5 onwards.

Read More

The 2017 AFL Draw: Difficulty and Distortion Dissected

I've seen it written that the best blog posts are self-contained. But as this is the third year in a row where I've used essentially the same methodology for analysing the AFL draw for the upcoming season, I'm not going to repeat the methodological details here. Instead, I'll politely refer you to this post from last year, and, probably more relevantly, this one from the year before if you're curious about that kind of thing. Call me lazy - but at least this year you're getting the blog post in October rather than in November or December.

Read More

Classifying Grand Finals (A Reprise)

(This piece originally appeared in the Guardian, and revisits the topic of defining a typology for Grand Finals, which I first looked at in 2009 where I came up with a similar solution, and again in 2014 where I used a fuzzy clustering approach.)

For fans, even casual ones, AFL Grand Finals are special, and each etches its own unique, defining legacy on the collective football memory. 

Read More

Team Ratings and Conversion Rates

A number of blog posts here in the Statistical Analysis portion of the MoS website have reviewed the rates at which teams have converted Scoring Shots into goals - a metric I refer to as the "Conversion Rate".

In this post from 2014 for example, which is probably the post most similar in intent to today's, I used Beta regression to model team conversion rates:

  1. as a function of venue, and the participating teams' pre-game bookmaker odds, venue experience, MARS Ratings, and recent conversion performance. 
  2. as a function of which teams were playing

Both models explained about 2.5 - 3% of the variability in team conversion rates, but the general absence of statistically significant coefficients in the first model meant that only tentative conclusions could be drawn from it. And, whilst some teams had statistically significant coefficients in the second model, its ongoing usefulness was dependent on an assumption that these team-by-team effects would persist across a reasonable portion of the future. We know, however, that teams go through phases of above- and below-average conversion rates, so that assumption seems dubious.

Other analyses have revealed that stronger teams generally convert at higher rates when playing weaker teams, so it's curious that the first model in that 2014 post did not have statistically significant coefficients on the MARS Ratings variable.

Maybe MoSSBODS, which provides separate offensive and defensive ratings, might help.

THE MODEL

For today's analysis we will again be employing a Beta regression (though this time with a logit link and not fitting phi as a function of the covariates), applying it to all games from the period from Round 1 of 2000 to Round 16 of 2016.

We'll use as regressors:

  • A team's pre-game Offensive and Defensive MoSSBODS Ratings
  • Their opponent's pre-game Offensive and Defensive MoSSBODS Ratings
  • The game venue
  • The (local) time of day when the game started
  • The month in which the game was played
  • The attendance at the game

(Note that the attendance and time-of-day data has been sourced from the extraordinary www.afltables.com site.)

Now, in recent conversations I've been having on Twitter and elsewhere people have been positing that:

  • better teams will, on average, create better scoring shot opportunities and so will convert at higher rates than weaker teams. In particular, teams with stronger attacks playing teams with weaker defences should show heightened rates of conversion.
  • dew and/or wet weather will generally depress scoring, partly because it will be harder to create better scoring opportunities in the first place, and also because any opportunity will be harder to convert than it would be from the same part of the ground were the weather more conducive to long and accurate kicking.

What's appealing about using including MoSSBODS ratings as regressors is that they allow us to explicitly consider the first argument above. If that contention is true. we'd expect to see a positive and significant coefficient on a team's own Offensive rating and a negative and significant coefficient on a team's opponent's Defensive rating.

On the second argument, whilst I don't have direct weather data for every game and so cannot reflect the presence or absence of rain, I can proxy for the likelihood of dew in the regression by including the variables related to the time of day that the game started and the month in which it was played.

Looking at the remaining regresors, venue is included based on an earlier analyses that suggested conversion rates varied significantly around the all-ground average for some venues, and attendance is included to test the hypothesis that teams may respond positively or negatively in their conversion behaviour in the presence of larger- or smaller-than-average crowds.

THE RESULTS

Details of the fitted mode appear below.

The logit formulation makes coefficient interpretation slightly tricky. We need firstly to recognise that estimates are relative to a notional "reference game", which for the model as formulated is a game played at the MCG, starting before 4:30pm and played in April.

The intercept coefficient of the model tells us that such a game, played between two teams with MoSSBODS Offensive and Defensive ratings of 0 (ie 'average' teams) would be expected to produce Conversion rates of 53.1% for both teams. We calculate that as 1/(1+exp(-0.126)). 

(Strictly, we should include some value for Attendance in this calculation, but the coefficient is so small that it makes no practical difference in our estimate whether we do or don't.)

Next, let's consider the four coefficients reflecting MoSSBODS ratings variables. We find, as hypothesised, that the coefficient for a team's own Offensive rating is positive and significant, and that for their opponent's Defensive rating is negative and significant.

Their size means that, for example, a team with a +1 Scoring Shot (SS) Offensive rating and a 0 SS Defensive rating playing a team with a 0 SS Defensive and Offensive rating would be expected to convert at 53.3%, which is just 0.2% higher than the rate in the 'reference game'. This is calculated as 1/1(1+exp(0.126+0.008)).

Strong Offensive teams will have ratings of +5 SS or even higher, in which case the estimated conversion rate would rise to just over 65%.

Similarly, a team facing an opponent with a +1 Scoring Shot (SS) Defensive rating and a 0 SS Offensive rating, itself having 0 SS Defensive and Offensive ratings would be expected to convert at 52.8%, which is about 0.3% higher than the rate for the 'reference game'.

The positive and statistically significant coefficient on a team's opponent's Offensive rating is a curious result. It suggests that teams convert at a higher rate themselves when facing an opposition with a stronger Offence.as compared to one with a weaker Offence. That opponent would, of course, be expected to convert at a higher-than-average rate itself, all other things being equal, so perhaps it's the case that teams themselves strive to create better scoring shot opportunities when faced with an Offensively more capable team, looking to convert less promising near-goal opportunities into better ones before taking a shot at goal. 

In any case, the coefficient is only 0.004, about half the size of the coefficient on a team's own Offensive rating, and about one-third the size of that on the team's opponent's Defensive Rating, so the magnitude of the effect is relatively small.

To the venue-based variables then, where we see that three grounds have statistically significant coefficients. In absolute terms, Cazaly's Stadium's is largest, and negative, and we would expect a game played there between two 'average' teams, starting before 4:30pm in April to result in conversion rates of around 46%.

Docklands has the largest positive coefficient and there we would expect a game played between the same two teams at the same time to yield conversion rates of around 56%.

The coefficients on the Time of Day variables very much support the hypothesis that games starting later tend to have lower conversion rates. For example, a game starting between 4:30pm and 7:30pm played between 'average' teams at the MCG would be expected to produce conversion rates of just over 52%. A later-starting game would be expected to produce a fractionally lower conversion rate.

Month, it transpires, is also strongly associated with variability in conversion rates, with games played in any of the months May to August expected to produce higher conversion rates than those played in April. A game between 'average' teams, at the MCG, starting before 4:30pm and taking place in any of those months would be expected to produce conversion rates of around 54%, which is almost 1% point higher than would be expected for the same game in April. The Month variable then does not seem to be proxying for poorer weather.

Relatively few games in the sample were played in March (150) so, for the most part, April games were the first few games of the season. As such, the higher rates of conversion in other months might simply reflect an overall improvement in the quality and conversion of scoring shot opportunities once teams have settled into the new season.

Lastly, it turns out that attendance levels have virtually no effect on team conversion rates.

SUMMARY

It's important to interpret all of these results in the context of the model's pseudo R-squared, which is, again, around 2.5%. That means the vast majority of the variability in teams' conversion rates is unexplained by anything in the model (and, I would contend, potentially unexplainable pre-game). Any conversion rate forecasts from the model will therefore have very large error bounds. That's the nature of a measure as lumpy and variable as Conversion Rate, which can move by tens of percentage points in a single game on the basis of a few behinds becoming goals or vice versa.

That said, we have detected some fairly clear "signals" and can reasonably claim that conversion rates are:

  • Positively associated with a team's Offensive rating
  • Negatively associated with a team's opponent's Defensive rating
  • Positively associated with a team's opponent's Offensive rating
  • Higher (compared to the MCG) at Docklands, and lower at Cazaly's Stadium and Carrara
  • Lower for games starting at 4:30pm or later compared to games starting before then
  • Higher (relative to April) for games played between May and August
  • Unrelated to attendance

Taken across a large enough sample of games, it's clear that these effects do become manifest, and that they are large enough, despite the vast sea of randomness they are diluted in, to produce detectable differences.

Next year I might see if they're large enough to improve MoSSBODS score projections because, ultimately, what matters most is if the associations we find prove to be predicitively useful.