Matter of Stats

View Original

Classifying Recent AFL Players by Position

The dynamic and free-flowing nature of AFL, along with the wide-ranging abilities of some of its players, can make it difficult to categorise any single player as strictly and always playing in a defined position.

Still, many players spend much of their time filling one of a smallish set of agreed roles, and we might expect that fact to reveal itself in the game statistics they accrue, and where they accrue them.

In today’s blog I’m going to investigate how well publicly available game statistics do define specific and distinct team roles, and how well the roles they define map to “official” player positions.

To do that I’ll be using, for player game statistics, the data as recorded on the Footywire site and made available through the fitzRoy R package, and, for player positions, those recorded on the AFL Player Ratings portion of the AFL website.

Some more details about the data follow.

DATA

We can extract information as far back as 2010 from the fitzRoy package, but for today we’ll confine ourselves to the period since 2015. That’s partly because we can only extract player positions from the AFL for 2018 players, but also because the changing nature of the game might mean that the same statistic drawn from a much earlier period might imply something different about the likelihood a player who recorded that statistic was playing in a particular spot.

For the 2015 through 2017 seasons we can use only data for non-retired players - since it is for them alone that we have positional data. As a result, we’ll find ourselves focussing heavily on the players from 2018.

Additionally, we’ll exclude any player who’s played fewer than 10 games across that four season period, which leaves us with 603 players to analyse.

For those 603 players we will use (or calculate) the following game statistics:

  • Contested Possessions (excluding Clearances)

  • Scoring Shots (Goals plus Behinds)

  • Goal Assists

  • Marks Inside 50

  • Other Marks

  • Tackles Inside 50

  • Other Tackles

  • One Percenters

  • Hit Outs

  • Inside 50s

  • Rebound 50s

  • Clearances

  • Turnovers

  • Intercepts

  • Other Disposals (Total Disposals less Goals, Behinds, Goal Assists, Contested Possessions and Inside 50s)

The selection of statistics here is somewhat arbitrary, although the aim is to ensure that each selected metric represents a non-overlapping count of some action. It mirrors the selection that was used in this, similar analysis on The Arc website in 2016, with the exception that we do not have player height available to us.

We gather or calculate these metrics for every player in every game across the four season and then proceed as follows:

  • Gross them up to reflect full-game statistics. So, for example, if a player was on the ground for only 50% of a game, we double all of his statistics.

  • Standardise each metric by subtracting the mean per player per game value and dividing by the standard deviation for that metric for that season. This adjusts for changes in overall football styles across the period, as discussed earlier.

  • For each player, calculate his per game standardised metrics across all games played in the 2015 to 2018 period

It is only once we have calculated the standardised metrics that we exclude players who have played in fewer than 10 games.

Attaching a position to each player is a straightforward, if boring procedure, the major difficulty stemming from the use of different name variants in the Footywire and AFL data - Nat in one, Nathan in the other; Ollie in one, Oliver in the other; even Patrick in one and Paddy in the other. If the AFL did nothing else, it would be great if they could allocate every player a unique ID.

THE PROCESS

To array the players in space, based on their standardised metrics, we’ll create what’s called a self-organising map (which, I’ll confess, I’d been leery of in the past until I tried it for this piece). More specifically, we’ll use the kohonen package in R (using a 15 x 15 hexagonal grid and a gaussian neighbourhood, for the technically curious).

This technique maps players to a two-dimensional space in which proximity connotes similarity and distance difference.

The output for our 603 players appears below. You can click on the image to access a larger version of it.

Note that the numbers in the ellipses have no intrinsic meaning, but will facilitate discussion later.

Each point in the map is a single player, and the colour of that point is determined by that player’s position as provided by AFL Team Ratings. Recall that the location of players (points) on the map is determined purely by their standardised game metrics, so the extent to which points of the same colour group together is a reflection of how well that position can be detected and delineated using the game metrics we’ve used.

As it turns out, for the most part, positions are fairly well delineated - especially for the Ruck position, the yellow members of which are mostly huddled in the bottom right of the map.

A few of those Ruckmen have ventured out, however, and mingled with some Key Forwards in cells 75, 90 and 105, which seems worth investigating. The standardised metrics for these Ruckmen and their cell-mates, compared to those of the average Key Forwards and Ruckmen, appear below, and reveal why it seems reasonable to group them on the basis of the available data.

Apeness, Crossley and Boyd generate fewer Hit Outs, more Marks Inside 50 and more Goal Assists than an average Ruckman, for example. They are by no means archetypal Key Forwards, but they are on the outer fringes of that classification, as their position on the map suggests.

POSITION PROFILES

We’ve just seen the average statistics for Rucks and Key Forwards. Let’s take a look at the profile for all of the positions across all 15 metrics.

These violin plots give you an idea of both the most common values recorded for a given metric and position, but also of their spread. We see, for example, that all but Key Forwards and Rucks have low standardised Hit Out averages, but also that there is considerable variability in the mostly non-zero Hit Out averages across Key Forwards and Rucks.

Broadly speaking, the various positions can be characterised based on the metrics that players in those positions tend to accumulate in larger or smaller numbers than do players in other positions. We do this below.

DISTANCES

The notion of distance is critical to most clustering algorithms, self-organising maps included. What I particularly like about the map is how it uses the grid and the cells to simultaneously convey distance (players in a different, far away, cell, are very different) and similarity (players in the same cell are quite similar).

The fact that we have

  • broad areas that contain mostly players of the same position

  • players of the same position still in sometimes distant cells, and

  • players of quite different positions sometimes in the same cells

I think nicely reflects our intuition that, whilst there are archetypal (say) Key Forwards, there are also players that clearly fill different roles at different times.

When considering the distances implied by proximity or otherwise in the self-organising map, it’s important to recognise that neighbouring cells are not all equidistant. The map stretches and squashes multidimensional space to different degrees in different parts of the map, in an effort to ensure that the entirety of the space is mapped to the 15 x 15 grid.

We can measure and display this squashing and stretching by creating a chart that shows the average distance between each cell and its neighbouring cells.

The main things to look for in this chart are the areas of particularly large average distances, which are darker, and we see these especially around cells 12, 13 and 28, which mark the boundaries of heartland Ruck territory. What this tells us is that these cells are among the best delineated on the map. We can also see relatively high levels of delineation at the map’s corners where we find archetypical Key Forwards, Midfielders, and Small/Medium Defenders.

INTERESTING CELLS

Earlier, we looked at cells 75, 90 and 105, which included an interesting mix of Ruckmen and Key Forwards. There are a few other non-homogeneous cells in the map that we’ll finish by looking at.

Cell 100

This cell includes three players, one classified as a Midfielder (Connor Menadue), one a Small/Medium Defender (Ed Richards), and the other a Small/Medium Forward (Sean Lemmens).

Based on their averaged standardised metrics, you can see why the map might have booked them into the same cell.

Across all the metrics they’re separated by no more than 0.44 standard deviations (which is the gap for Intercepts). The average difference is only 0.22 standard deviations.

Cell 165

This cell is replete with Key Forwards but also includes Tim Membrey, who’s been classified as a Small/Medium Forward.

Once again, based on averaged standardised metrics, you can see why the map might have allowed him into this club.

SOME CONCLUDING THOUGHTS

I think this analysis suggests that self-organising maps have something to offer in player classification and clustering, alongside other clustering techniques such as Partitioning Around Medoids, k-means, and so on. What I particularly like about it is the underlying concept of the grid and of breaking up the vast multidimensional space defined by all of the input metrics in 225 (or, as many or few as you like) distinct regions.

Another important feature of the algorithm in that it allows for prediction. So we could, for example, use it to classify players from earlier (or future) seasons for which we have no positional data from the AFL. We could also use it to classify individual games for players to get a sense of the different roles they might have played during that contest, subject to the fact that this would be an inherently noisy process.

One final thought it that if would be interesting to explore the relationship between players’ cell membership in the map and their accumulation of AFL Rating points and Brownlow votes. I wonder, for example, if players who are atypical Midfielders (in the sense of being ‘out of place’ in the self-organising map) might accrue Ratings points or Brownlow votes at a different rate because of fact that they are stand-outs in their role, or because they do things that the standard game statistics aren’t capturing.

(If you’re curious about the occupants of any particular cell let me know. I have the full mapping from player to cell.)