Letting the Computer Do (Most of) the Work

Around this time of year it's traditional to work through the remaining matches for each team and attempt to codify what each needs to do in order to secure a particular finish - minor premiership, top 4, top 8 or Spoon.

This year, rather than work through all the combinations manually, I've decided to be lazy - purely for instructional purposes, I should add - and enlist the help of rule induction, a mathematical technique for deducing from a dataset statements in the form If A and B then C that describe key variables in that data.

So, for example, if you were to apply the technique to help describe the use of heating and cooling appliances by a household over the course of a few years you might collect information several times each day about who was home, what the outside temperature was, what day of the week and time of day it was, and whether or not a heating or a cooling appliance was turned on.

Using a rule induction algorithm, you'd be able to come up with statements such as this one: 

  • If Number of People Home is greater than 0 AND Outside Temperature is less than 15 degrees AND Time of Day is between 5:30pm and 11:30pm AND Day of Week is not Saturday or Sunday then Heating = ON (Probability 92%)

For this blog I provided a rule induction algorithm (the JRip Weka algorithm running in R, if you're curious) with the outputs from 10,000 of the simulations I used in my earlier blog, which included for each simulation:

  • The results of each of the remaining 16 games
  • The final ladder positions of each team if these were the actual results of each game

To simplify matters a little, and recognising that the main interest is not in exact ladder position finishes, I summarised each team's finishing position as either "1st","2nd to 4th","5th to 8th","9th to 15th", or "16th".

The goal was that the rule induction algorithm would output rules of the form:

  • If X beats Y AND X beats Z AND ... then X finishes 5th to 8th

Rule induction worked remarkably well. Here are a few real examples of the rules that the algorithm offered up for Collingwood's fate:

  • Rule 1: (Collingwood..v..Adelaide <= 0) and (Hawthorn..v..Collingwood >= 0) and (Carlton..v..Geelong <= 0) => Collingwood = 2nd to 4th (168.0/2.0)
  • Rule 2: => Collingwood =1st (9832.0/0.0)

Rule 1 can be interpreted as follows: 

  • If Collingwood loses to or draws with Adelaide (ie the margin in that game, couched in terms of Collingwood is less than or equal to zero) AND Collingwood loses to or draws with Hawthorn AND Geelong beats or draws with Carlton then Collingwood finish 2nd to 4th.

What's implicit here is that Geelong also beats West Coast but since, in the simulations, this always occurred when the other conditions in the rule were met, the algorithm didn't realise that this was an additional required condition.

As well, Collingwood can't be allowed to draw both its games otherwise Geelong can't overhaul them. Again, this situation didn't occur in the simulations I provided the algorithm, and not even the smartest algorithm can intuit instances that it's never seen.

I could probably have fixed both of these shortcomings by providing the algorithm with more than 10,000 simulations, though I'd pay a price in terms of computation time. Note though the (168.0 / 2.0) annotation at the end of this rule. That tells you that the rule could be applied to 168 of the simulations, but that it was wrong for 2 of them. Maybe the two simulations for which the rule applied but was incorrect included a Geelong loss to the Eagles or two draws for Collingwood.

Rule creation algorithms include what's called a "stopping rule" to prevent them from creating a unique rule for every simulation result, which might make the rules highly accurate but also makes them completely impractical.

Rule 2 is the "otherwise" rule and is interpreted as the predicted outcome if none of the earlier rules' full set of conditions are met. For Collingwood, "otherwise" is that they finish 1st.

The rules provided for other teams were generally quite similar, although they became more complex for teams when percentages were required to determine crucial ladder positions. Here, for example, are a few of the rules where the algorithm is attempting to model Hawthorn getting bumped into 9th by Melbourne: 

  • (Hawthorn..v..Fremantle <= -7) and (Port.Adelaide..v..Melbourne = -14) and (Melbourne..v..Kangaroos >= 20) and (Hawthorn..v..Collingwood = -39) and (Melbourne..v..Kangaroos Hawthorn = 9th to 15th (54.0/2.0)
     
  • (Hawthorn..v..Fremantle <= -4) and (Port.Adelaide..v..Melbourne = -7) and (Melbourne..v..Kangaroos = 11) and (Hawthorn..v..Collingwood = -59) and (Port.Adelaide..v..Melbourne = -32) = Hawthorn = 9th to 15th (41.0/3.0)

Granted that's a mite convoluted, but nothing that a human can't recognise fairly quickly, which nicely illustrates my experience with this type of algorithm: their outputs almost always contain some useful insights but the extraction of this insight requires human interpretation.

What follows then are the rules that man and machine have crafted for each team (note that I've chosen to ignore the possibility of draws to reduce complexity)

Collingwood 

  • Finish 2nd to 4th if Collingwood lose to Adelaide and Hawthorn AND Geelong beat Carlton and West Coast
  • Otherwise finish 1st

Geelong

  • Finish 1st if Collingwood lose to Adelaide and Hawthorn AND Geelong beat Carlton and West Coast
  • Otherwise finish 2nd to 4th

 St Kilda

  • Finish 2nd to 4th

 Western Bulldogs

  • Finish 5th to 8th if Dogs lose to Essendon and to Sydney AND Fremantle beat Hawthorn and Carlton
  • Otherwise finish 2nd to 4th

 Fremantle

  • Finish 2nd to 4th if Dogs lose to Essendon and to Sydney AND Fremantle beat Hawthorn and Carlton
  • Otherwise finish 5th to 8th 

Carlton and Sydney

  • Finish 5th to 8th

Hawthorn 

  • Finish 9th to 15th if Hawthorn lose to Fremantle and Collingwood AND Roos beat West Coast and Melbourne
  • Also Finish 9th to 15th if Hawthorn lose to Fremantle and Collingwood AND Melbourne beat Port and Roos sufficient to raise Melbourne's percentage above Hawthorn's
  • Otherwise finish 5th to 8th

Kangaroos

  • Finish 5th to 8th if Hawthorn lose to Fremantle and Collingwood AND Roos beat West Coast and Melbourne
  • Otherwise finish 9th to 15th 

Melbourne 

  • Finish 5th to 8th if Hawthorn lose to Fremantle and Collingwood AND Melbourne beat Port and Roos sufficient to raise Melbourne's percentage above Hawthorn's
  • Otherwise finish 9th to 15th 

Adelaide, Port Adelaide and Essendon

  • Finish 9th to 15th 

Brisbane Lions 

  • Finish 16th if Lions lose to Essendon and Sydney AND West Coast beat Geelong and Roos sufficient to lift West Coast's percentage above the Lions' AND Richmond beat St Kilda or Port (or both)
  • Otherwise finish 9th to 15th 

Richmond 

  • Finish 16th if West Coast beat Geelong and Roos AND Richmond lose to St Kilda and Port Otherwise finish 9th to 15th 

West Coast 

  • Finish 9th to 15th if West Coast beat Geelong and Roos AND Richmond lose to St Kilda and Port Finish 9th to 15th if West Coast beat Geelong and Roos AND Lions lose to Essendon and Sydney sufficient to lift West Coast's percentage above the Lions'
  • Otherwise finish 16th

As a final comment I'll note that the rules don't allow for the possibility of Sydney or Carlton slipping into 4th. Although this is mathematically possible, it's so unlikely that it didn't occur in the simulations provided to the algorithm. (Actually, it didn't occur in any of the 100,000 simulations from which the 10,000 were chosen either.)

A quick bit of probability shows why.

Consider what's needed for Sydney to finish fourth.
1. The Dogs lose to Essendon and Sydney
2. Sydney also beat the Lions
3. Fremantle don't win both their games

Furthermore, combined, Sydney and the Dogs' results have to close the percentage gap between the two teams, which currently stands at over 25 percentage points.

But the 15% and 60% figures just relate to the probability of the required result, not the probability that the wins and losses will be big enough to lift Sydney's percentage above the Dogs'. If Sydney were to trounce the Lions by 100 points and Essendon were to do likewise to the Dogs, then Sydney would still need to beat the Dogs by about 91 points to achieve such a lift.

So let's revise the probability of 1 down to 0.01% (which is probably generous) and the probability of 2 down to 5% (which is also generous). Then the overall probability is 0.01% x 5% x 80%, or about 1 in 250,000. Not gonna happen.

(For similar reasons there are also no rules for Fremantle dropping a game but still grabbing 4th from the Dogs on the basis of a superior percentage.)

Playing the Percentages

 

It seems very likely that this season, some ladder positions will be decided on percentage, so I thought it might be helpful to give you an heuristic for estimating the effect of a game result on a team's percentage.

A little maths produces the following exact result for the change in a team's percentage:

(1) New Percentage = Old Percentage + (%S - %C)/(1 + %C) * Old Percentage

where

%S = the points scored by the team in the game in question as a percentage of the points it has scored all season, excluding this game, and

%C = the points conceded by the team in the game in question as a percentage of the points it has conceded all season, excluding this game.

(In passing, I'll note that this equation makes it obvious that the only way for a team to increase its percentage on the basis of a single result is for %S to be greater than %C or, equivalently, for %S/%C to be greater than 1. Put another way, the team's percentage in the most current game needs to exceed its pre-game percentage.

This equation also puts a practical cap on the extent to which a team's percentage can alter based on the result of any one game at this stage of the season. For a team with a high percentage the term (%S - %C) will rarely exceed 5%, so a team with, for example, an existing percentage of 140 will find it hard to move that percentage by more than about 7 percentage points. Alternatively, a team with an existing percentage of just 70, which might at the extremes produce a (%S - %C) of 7%, will find it hard to move its percentage by more than about 5 percentage points in any one game.)

As an example of the use of equation (1) consider Sydney, who have scored 1,701 points this season and conceded 1,638, giving them a 103.8 percentage. If we assume, since this is Round 20, that they'll rack up a score this week that's about 5% of what they've previously scored all season and that they'll concede about 4%, then the formula tells us that their percentage will change by (5% - 4%)/(104%) * 103.8 = 1 percentage point.

Now 5% x 1,701 is about 85, and 4% x 1,638 is about 66, so we've implicitly assumed an 85-66 victory by the Swans in the previous paragraph. Recalculating Sydney's percentage the long way we get (1,701+85)/(1,638+66), which gives a 104.8 percentage and is, indeed, a 1 percentage point increase.

So we know that the formula works, which is nice, but not especially helpful.

To make equation (1) more helpful, we need firstly note that at this stage of the season the points that a team concedes in a game are unlikely to be a large proportion of the points they've already conceded so far in the entire season. So the (1+C%) in equation (1) is going to be very close to 1. That allows us to rewrite the equation as:

(2) Change in Percentage = (%S - %C) * Old Percentage

Now this equation makes it a little easier to play some what-if games.

For example we can ask what it would take for Sydney, who are currently equal with Carlton on competition points, to lift their percentage above Carlton's this weekend. Sydney's percentage stands now at 103.8 and Carlton's at 107.0, so Sydney needs a 3.2 percentage point lift.

Using a rearranged version of Equation (2) we know that achieving a lift of 3.2 percentage points from a current percentage of 103.8 requires that (%S - %C) be greater than 3.2/103.8, or about 3%. Now, if we assume that Sydney will concede points roughly equal to its season-long average then %C will be 1/19 or a bit over 5%.

So, to get the necessary lift in percentage, Sydney will need %S to be a bit over 5% + 3%, or 8%. To turn that into an actual score we take 8% x 1,701 (the number of points Sydney has scored in the season so far), which gives us a score of about 136. That's how many points Sydney will need to score to lift its percentage to around 107, assuming that its opponent this week (Fremantle) scores 5% x 1,638, which is approximately 82 points.

Within reasonable limits you can generalise this and say that Sydney needs to beat Fremantle by 54 points or more to lift its percentage to 107, regardless of the number of points Freo score. In reality, as Fremantle's score increase - and so %C rises - the margin of victory required by Sydney also rises, but only by a few points. A 60-point margin of victory will be enough to lift Sydney's percentage over Carlton's even in the unlikely event that the score in the Sydney v Freo game is as high as 170-110.

Okay, let's do one more what-if, this one a bit more complex.

What would it take for Melbourne to grab 8th spot this weekend? Well the Roos and Hawthorn would need to lose and the combined effect of Hawthorn's loss and Melbourne's win would need to drag Melbourne's percentage above Hawthorn's. Conveniently for us, Hawthorn and Melbourne meet this weekend. Even more conveniently, their respective points for and points against are all quite close: Hawthorn's scored 1,692 points and conceded 1,635; Melbourne's scored 1,599 and conceded 1,647.

The beauty of this fact is that, for both teams, in equation (2) Old Percentage is approximately 1 and, for any score, Hawthorn's %S will be approximately Melbourne's %C and vice versa. This means that any increase in percentage achieved by either team will be mirrored by an equivalent decrease in the percentage of the other.

All Melbourne needs do then to lift its percentage above Hawthorn's is to lift its percentage by one half the current difference. Melbourne's percentage stands at 97.1 and Hawthorn's at 103.5, so the difference is 6.4 and the target for Melbourne is an increase of 3.2 percentage points.

Melbourne then needs (%S-%C) to be a bit bigger than 3%. Since the divisors for both %S and %C are about the same we can re-express this by saying that Melbourne's margin of victory needs to be around 3% of the points it's conceded so far this season, which is 3% of 1,647 or around 50 points. Let's add on a few points to account for the fact that we need the margin to be a little over 3% and call the required margin 53 points.

So how good is our approximation? Well if Melbourne wins 123-70, Hawthorn's new percentage would be (1,692+70)/(1,635+123) = 1.002, and Melbourne's would be (1,599+123)/(1,647+70) = 1.003. Score 1 for the approximation. If, instead, it were a high-scoring game and Melbourne won 163-110, then Hawthorn's new percentage would be (1,692+110)/(1,635+163) = 1.002, and Melbourne's would be (1,599+163)/(1,647+100) = 1.003. So that works too.

In summary, a victory by the Dees over the Hawks by around 9-goals or more would, assuming the Roos lose to West Coast, propel Melbourne into the eight - not a confluence of events I'd be willing to wager large sums on, but a mathematical possibility nonetheless.