A First Attempt at Combining AFL Team and Player Data in a Predictive Model

With the 2018 AFL season completed, we’ve now a little time to think about how team and player data might be combined into a predictive model, and to estimate how much better such a hybrid model might perform than one based on team data alone.

THE DATA

To build this model we’ll be combining the predictions made by the MoSHBODS Team Rating System with the SuperCoach score data available via the fitzRoy R package and its scraping of the Footywire website.

The details of the equation that converts player statistics into a SuperCoach score are proprietary, but we can get some idea of the player actions it rewards and punishes, and in what magnitudes, by creating a linear regression using as inputs player statistics available in the fitzRoy package for each game, and as our target variable that player’s SuperCoach score in the same game.

That model is summarised at right and explains almost 92% of the variability of individual SuperCoach scores across the four seasons from 2015 to 2018, which is the period for which all of the player statistics included in the model are available.

This clearly isn’t the exact equation that’s used to generate SuperCoach scores for a few reasons:

  1. Player statistics are highly correlated, so any model will struggle to correctly assign predictive power to each individual statistic. If, for example, Uncontested Possessions and Effective Disposals are highly correlated but Uncontested Possessions are actually weighted far more heavily in the true SuperCoach equation, it could be that Effective Disposals are shown as carrying more weight by our linear regression than they do in the actual SuperCoach equation.

  2. The creators of SuperCoach scores have revealed that not all statistics of the same type are weighted equally, as explained in the piece above headed “How does scoring work?” from a 2018 HeraldSun article.

  3. They’ve also explained that scoring includes statistics we don’t have, such as gathers, shepherds and spoils.

  4. Our regression model doesn’t explain 100% of the variability, which a perfect model would.

Looking at player SuperCoach scores across the period 2010 to 2018 reveals that the average score per game per player is almost exactly 75 points and the median 73 points. The profile by team appears below.

As you can see, the profiles are very similar for each team, and the averages vary by only a little, from a minimum of 69.8 points per player for Gold Coast to a maximum of 78.3 for Hawthorn.

METHODOLOGY

To perform the analysis, we’re going to take the available MoSHBODS and SuperCoach data for the nine seasons from 2010 to 2018, split it 50:50 into a training and test set, and build linear regressions on the training set to explain game margins - the difference between the home and the away team final scores. We’ll measure the final performance of our models on the test set.

Now we need to incorporate the player level SuperCoach data into our regression models in such a way that it summarises the quality of the named players for a given game. There are any number of ways we could proceed to do this, but today I’m going to estimate the quality of a player by his average SuperCoach score in games played during the past 12 months.

To deal with the variability inherent in estimates for players that have played relatively few games in that period I’m going to create two versions of “regularised” estimates, where the regularisation serves to drag estimates closer to the average score of 75.

  1. We set a minimum number of games and, for players who’ve played fewer than the minimum, supplement their actual scores with notional scores, each at the average of 75 points, for their “missing games”. So, for example, if we set our minimum at five games and we have a player who played just three games in the last 12 months with actual scores of 72, 90 and 100, we would supplement his three actual games with two notional games in which he scored 75 to give him a regularised average of (72+90+100+75+75)/5 or 82.4 points per game.
    The regularisation will serve to drag the averages of players with fewer than the minimum number of games closer to 75. The fewer games they have actually played, the closer to 75 will their average be drawn.

  2. We set a minimum number of games and, for players who’ve played fewer than the minimum, take a simple average of their actual average score and 75. So, for example, if we set our minimum at five games and we have a player who played just three games with actual scores of 72, 90 and 100, we would take his actual average of 87.3, add 75, and divide the sum by 2 to give a regularised average of 81.2 points.
    This regularisation will also serve to drag the averages of players with fewer than the minimum number of games closer to 75, but the amount of regularisation is unaffected by the actual number of games they’ve played less than the minimum.

Having calculated a regularised average for every player we will then calculate the mean of these averages for each team going into a contest. That average will be a proxy for the team’s strength.

The regression models we’ll fit are:

  1. Game Margin = Constant + k x MoSHBODS Expected Score

  2. Game Margin = Constant + k x MoSHBODS Expected Score + l x Home Team Mean Regularised SuperCoach Score + m x Away Team Mean Regularised SuperCoach Score

We’ll fit a variety of models of type 2 where we vary the minimum numbers of games used for the regularisation process.

The results are shown below for the mean absolute error (MAE) metric, which measures by how many points, on average, the models’ fitted values for the game margin differ from the actual game margin.

So, for example, the mean absolute error (MAE) using MoSHBODS expected margins alone for the 2011 season is 31.33 points per game. By comparison, using the first regularisation method with a minimum number of games of 12 yields a model with a MAE of 29.98 for that same season.

The best model for that season is the one that uses the first regularisation method with a five game mimimum. Its MAE is 29.97 points per game.

Averaged across the eight available seasons (we lose 2010 because our SuperCoach averages look at games played in the previous 12 months and so don’t exist for the 2010 season itself), the best fitting model is the one that uses the second regularisation method and a five game minimum. It has an overall MAE of 28.71 points per game, which is at least 0.03 points per game better than any other model. More importantly, it’s 0.79 points per game better than the model that uses MoSHBODS only.

That superiority in fitting the training data set is to be expected however, because the models using regularised average SuperCoach scores have an extra two parameters or ‘degrees of freedom’. That guarantees these models will provide a superior in-sample fit to (or, more technically, a fit that is no worse than) the model that uses only MoSHBODS expectations.

The real test then is the relative fit of the models to the test data, which is summarised at left for the MoSHBODS-only model and for the model using the regularised pre-game average SuperCoach scores of the two teams.

We find that the model incorporating SuperCoach scores yields lower MAEs in five of the eight season and, overall across all eight seasons, has a 0.37 points per game lower MAE. That’s a non-trivial difference, especially when you consider that it’s a result estimated across about 800 games.

Clearly there is some benefit in using player SuperCoach scores along with team ratings data in creating predictive models for game margins.

PLAYER VALUES

We can fit one final linear regression in which we model game margin solely as a function of the regularised SuperCoach scores, which will allow us to form estimates of player value in terms of margin or score expectations.

The model we get, fitted to the training data and using the same version of the regularised SuperCoach score that we used for the previous model, is:

Fitted Game Margin = -7.10 + 5.61 x Home Team Mean SuperCoach Score - 5.39 x Away Team Mean SuperCoach Score.

This model has an overall MAE on the test data of 30.1 points per game, which is about 1.5 points per game worse than the MoSHBODS-only model. So, it’s an inferior fit, but not a terrible one.

Using the model and recognising that there are 22 players in a squad for a game, we can say that each additional SuperCoach point for a single player is worth 5.6/22 or about one-quarter of a point for a home team, and a little less for an away team.

Given an average SuperCoach score of 75 then gives the following table of estimated player values relative to a replacement player of that average ability for some 2018 players at particular points in the season.

Tom Lynch, for example, was worth about an extra half a goal relative to an average player as he went into his Round 4 game.

Patrick Dangerfield, by contrast, in the early part of the season, was worth an extra 2.5 goals relative to an average player. He achieved the highest average SuperCoach score of any player in 2018 of 135.3 going into the away game against Essendon in Round 9.

The lowest average SuperCoach score recorded by any player in 2018 who had played at least 12 games in the previous 12 months was 36.9, which was recorded by Nathan Brown going into the Round 8 game against Fremantle.

NEXT STEPS

This post has been just a first attempt at incorporating individual player-based data into predictive models here on MoS. Other things we could try in future include:

  • Exponential smoothing of SuperCoach scores so that more-recent performances carry a heavier weight than performances from further back in time. This would allow us to include in each player’s average, games from more than a year ago, albeit with probably quite low weights, provided that we find these add predictive power

  • Inclusion of player experience - individual and shared (for more on this topic see this earlier post)

  • Creation of a bespoke player score equation, similar to the SuperCoach equation, but using different weights on individual player statistics

  • Differential scores by player depending on whether they were playing at home or away (eg we might create separate regularised SuperCoach scores for home versus away games)

There is still much to do. Let me know if you have any ideas for other things to explore.