Goals and Behinds: How Correlated are They?

This week I’ve been investigating the use of the Skellam Distribution in modelling AFL scores. That distribution can be derived as the difference between two, correlated, Poisson variables, which potentially makes it useful in an AFL context for modelling differences between team metrics.

Scores as Poissons

The simplest way to apply it would be to assume that both the home and the away team scores are drawn from Poisson distributions with some, presumably negative, correlation between the pair (since other analyses have shown that, when one team outperforms expectations, its opponent tends to underperform - a finding we’ll reaffirm later).

If we do that, however, with say an assumed correlation of around -0.25 between the home and away scores, we immediately hit a snag in that the standard deviation of the resulting Skellam distribution for game margins - the difference between the teams’ scores - is only around 13 to 15 points, which is way too small given empirical evidence that has estimates for this in the mid to high 30s.

For example, consider a game where the favourite is expected to score 90 points, and the underdog 70 points. Then, in R, with the Skellam package installed, we’d have:

Margin = rskellam(100000, 90-(-0.25)*160, 70-(-0.25)*160)

mean(Margin)

20.108

sd(Margin)

15.52391

This formulation then seems problematic, but might there be other ways to use the Skellam?

Goals and Behinds as Poissons

In a piece entitled Dynamic Bayesian forecasting of AFL match results using the Skellam distribution the authors, instead, separately model the margin for goals and the margin for behinds using the Skellam. With that approach as an inspiration, and assuming correlations of about -0.25 between the two teams’ goals and the two teams’ behinds, and conversion rates of 53% for the home team and 51% for the way team, we proceed as follows in R:

S1 = 90; S2 = 70 # Expected Scores

C1 = 0.53; C2 = 0.51 # Expected Conversion Rates

# Expected Goals and Behinds for Team 1 and 2

G1 = C1 * S1 / (5 * C1 + 1); B1 = (1 - C1) * S1 / (5 * C1 + 1)

G2 = C2 * S2 / (5 * C2 + 1); B2 = (1 - C2) * S2 / (5 * C2 + 1)

pG = -0.25; pB = -0.25 # Correlations for Goals and Behinds

# Separately generate Goal and Behind Differences

MarginG = rskellam(100000, G1-pG*(G1+G2), G2-pG*(G1+G2))

MarginB = rskellam(100000, B1-pB*(B1+B2), B2-pB*(B1+B2))

# Convert to a Margin

Margin = 6*MarginG + MarginB

mean(Margin)

20.08023

sd(Margin)

35.86766

That looks much more promising.

Calculating the Correlations

Implementing this approach led me next to wonder about the correlations between the goals and behinds scored in a match. In particular:

  • If a team scores more goals than we expected pre-game, does it tend to score more behinds, fewer behinds, or about the same number of behinds as we expected pre-game?

  • If one team scores more goals than we expected pre-game, does its opponent tend to score more goals, fewer goals, or about the same number of goals as we expected pre-game?

  • If one team scores more behinds than we expected pre-game, does its opponent tend to score more behinds, fewer behinds, or about the same number of behinds as we expected pre-game?

To investigate this issue we’ll use the pre-game closing handicap and line markets, as recorded in the spreadsheet on the Aus Sports Betting site.

Combining these two markets (and making some adjustments when the final prices are not evens), we can calculate implied expected home and away team scores as follows:

  • Expected Home Score = (Expected Total - Handicap)/2

  • Expected Away Score = Expected Total - Expected Home Score

If we, further, assume a fixed 53% conversion rate for home teams, and 51% for away teams, these expected scores can be converted to expected goals and expected behinds.

We can then compare the actual and expected goals and behinds, and calculate the correlations between the various differences.

Doing that, we get, from a sample of 1,687 games across 2014 to 2022:

  • Cor(Excess Home Goals, Excess Away Goals) = -0.21

  • Cor(Excess Home Behinds, Excess Away Behinds) = -0.21

  • Cor(Excess Home Goals, Excess Away Behinds) = -0.16

  • Cor(Excess Home Behinds, Excess Away Goals) = -0.12

  • Cor(Excess Home Goals, Excess Home Behinds) = -0.02

  • Cor(Excess Away Goals, Excess Away Behinds) = -0.02

These results can be interpreted as follows:

  • When the home team registers more (fewer) goals than expected, the away team tends to register fewer (more) goals and fewer (more) behinds than expected

  • When the home team registers more (fewer) behinds than expected, the away team tends to register fewer (more) behinds and fewer (more) goals than expected

  • When the home team or away team registers more or fewer goals than expected, it tells us nothing about the number of behinds they register, relative to expectations

Put another way, one team’s scoring tends to be negatively correlated with the other team’s (confirming other analyses) but, interestingly, a team’s goal production relative to expectations has no effect on its behind production relative to expectations. So, it’s neither the case that teams having a “good” day in terms of registering goals are also likely to increase their behinds production, nor the case that teams tend to have a fixed reservoir of scoring shots and merely change their on-the-day conversion rate by swapping goals for behinds, and vice versa.

Now these correlation coefficients have been calculated across the entirety of the sample. Maybe the relationship differs depending on the size of a team’s expected score. So, combining the results for home teams and away teams and forming groups containing games where the expected score is roughly similar (generally within a goal, except at the ends), we get the following correlations for excess goals and excess behinds:

  • Expected Goals 5.7 - 8.0 / Expected Behinds 5.4 - 7.7 (n = 132) : -0.03

  • Expected Goals 8.1 - 9.0 / Expected Behinds 7.1 - 8.7 (n = 184) : -0.01

  • Expected Goals 9.1 - 10.0 / Expected Behinds 8.0 - 9.7 (n = 313) : +0.02

  • Expected Goals 10.1 - 11.0 / Expected Behinds 8.9 - 10.6 (n = 521) : -0.04

  • Expected Goals 11.1 - 12.0 / Expected Behinds 9.8 - 11.6 (n = 537) : +0.01

  • Expected Goals 12.1 - 13.0 / Expected Behinds 10.7 - 12.5 (n = 554) : -0.004

  • Expected Goals 13.1 - 14.0 / Expected Behinds 11.6 - 13.5 (n = 490) : -0.03

  • Expected Goals 14.1 - 15.0 / Expected Behinds 12.5 - 14.4 (n = 304) : -0.02

  • Expected Goals 15.1 - 16.0 / Expected Behinds 13.4 - 15.4 (n = 185) : -0.05

  • Expected Goals 16.1 - 21.7 / Expected Behinds 14.2 - 19.2 (n = 154) : -0.02

There does not, then, appear to be any strong relationship between excess goals and excess behinds for any of those expected score ranges.

I think that’s an unintuitive result, and one that deserves some deeper thought. Would love to hear yours.