The Monash Tipping Competitions: Setting Targets

April 11, 2015 Tony Corke

Monash University has been running AFL tipping competitions for over 20 years and this year is offering three variants, all of which are open to the public.

What differentiates these three competitions is:

the form in which a forecaster is required to express his or her predictions
the manner in which the quality of those predictions is assessed in light of the final outcome

You can read more details about each of the competitions on the Monash site, but here I'll provide a broad overview.

THE NORMAL COMPETITION

In this competition a forecaster is asked to predict the margin of victory and is rewarded with 10 points if the correct team is selected as the winner, and bonus points depending on how close the predicted margin was to the actual margin, calculated as follows:

An exact prediction yields 6 bonus points
A prediction within 6 points of the final margin yields 5 bonus points
A prediction within 12 points of the final margin yields 4 bonus points
A prediction within 18 points of the final margin yields 3 bonus points
A prediction within 24 points of the final margin yields 2 bonus points
A prediction within 30 points of the final margin yields 1 bonus point

Bonus points are available to all forecasters, regardless of whether or not they selected the correct team to win so, for example, a prediction that the Swans will win by 4 points will earn 4 bonus points even if the Swans, instead, lose by 6 points, since the prediction is in error, in absolute terms, by 10 points.

The goal for a forecaster in this competition is to, firstly, pick the correct team and, secondly, predict a margin close to the actual final margin.

THE GAUSSIAN COMPETITION

In this competition, as in the Normal Competition, a forecaster predicts a margin of victory, but must also provide a "standard deviation", which indicates how confident he or she feels about the margin prediction. The less confident the forecaster feels, the larger the standard deviation he or she should provide.

Scoring in the Gaussian Competition is more complex than in the Normal Competition and is based on the probability density associated with the Normal Distribution that has mean equal to the forecasted margin and standard deviation equal to the nominated standard deviation, evaluated at the actual final margin. The nearer the forecasted margin is to the actual margin and the smaller is the nominated standard deviation, the larger will be the probability density in the vicinity of the actual margin, and so the larger will be the forecaster's score.

The goal for a forecaster in this competition is, therefore, to be close to the correct margin and, if close, to have nominated a small standard deviation. Nominating a very small standard deviation, however, which signifies a very high level of confidence in one's forecast, is heavily penalised if that forecast turns out to be a long way from the actual margin.

THE PROBABILISTIC COMPETITION

In this competition a forecaster assigns a probability of victory to the competing teams (much as the Probability Predictors do over on the Wagers and Tips journal) and is rewarded using a Log Probability Score (LPS) - the same LPS as is used for MoS' Probability Predictors.

It's not difficult to prove mathematically that a forecaster's best strategy under an LPS scoring rule - which, formally, is a "Proper Scoring Rule" and so has this property - is to attach probability assessments of X% to teams that win exactly X% of the time (ie to be "well calibrated").

SIMULATING THE PERFECT FORECASTER

So the question is, if we're participating in these competitions (which this year I am under the MatterOfStats alias) what should we take to be "good" scores in each of them?

For the purposes of investigating this issue I'm going to simulate games under the following assumptions:

The final outcome of a game follows a Normal distribution with some pre-determined, pre-game mean and fixed standard deviation (an assumption for which I've ample theoretical and empirical evidence)
The forecaster is "perfect" in that he or she knows the true mean for every game (and the probability of victory that this necessarily implies), and that he or she forecasts that mean in the Normal and Gaussian competitions, and that probability in the Probabilistic competition

I'm curious to understand the impact of a forecaster's varying his or her standard deviation nominations, so I'm going to vary these across the simulations. I also want to understand how expected scoring in each of the competitions might vary as the expected margin changes - that is, as the home team moves from being a raging underdog to a raging favourite - so I'm also going to vary the expected margin of victory across the simulations. Implicitly that will mean that the true probability of victory alters too.

Each scenario in the simulation therefore comprises:

A fixed true expected margin, ranging from -50 to +50 across different simulation runs
A fixed true standard deviation of the final margin, which I set to 37 points initially
A fixed forecast margin, equal to the true expected margin
A fixed forecast standard deviation, which takes on one of the values 30, 33, 37, 41 or 44 across different simulation runs

For each scenario I run 100,000 simulations. So, for example, my first run of 100,000 simulations has the true expected margin set to -50, the true standard deviation set to 37, the forecast margin set to -50, and the nominated standard deviation set to 30. The next 100,000 simulations has the true and forecast margins set to -49 and all other parameters as in the previous run. In total, the 500 simulation runs proceed by progressively applying all 100 margins with all 5 nominated standard deviations.

The results of these 500 simulation runs appear below as three charts, one for each competition. In each, the value of the expected margin runs from -50 on the left to +50 on the right. Note that I've excluded an Expected Margin of exactly 0 because I wasn't certain how the Normal competition would work in that instance since it's impossible for a forecaster to have selected the "winning" team in such a contest. In that case, is the 10 points for correctly selecting the winning team off the table? (If so, the expected score drops to about 3 points).

The first thing immediately apparent from these results is that, under my assumptions, a perfect forecaster's expected score in the Gaussian Competition is (sampling variability aside) a constant, regardless of the underlying expected margin, while in the Normal and Probabilistic Competitions the expected score increases as we move away from equal favouritism for the two teams (ie as the Expected Margin moves nearer -50 or +50 and the victory probability of the home team approaches 0 or 1).

In the Probabilistic Competition (see lowest chart) a forecaster in a given game can, ignoring draws, score only one of two values:

1+log(p) base 2 if the selected team (ie the team for which p > 0.5) wins
1+log(1-p) base 2 if the selected team loses

Since the forecaster is assumed to be "perfect" and therefore know the correct value of p, the expected score in this Competition is therefore:

p x (1+log(p) base 2) + (1-p) x (1+log(1-p) base 2)

The chart above maps out that function as p moves from near 0 to near 1 and shows, for example, that the expected score is 0 for a game involving equal favourites and about 0.2 for games with a 25-point favourite.

In the Normal Competition (see middle chart) a forecaster can score any of 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15 or 16 points in a single game and the expected score ranges from about 7 for games with near-equal favourites to about 11 where there's a 50-point favourite. The increase in the expected score in the Normal Competition as the gap in ability between the teams increases arises because statistical variability becomes less likely, given a fixed standard deviation, to produce a result so deviant as to produce an underdog win.

Take, for example, a game with a 50-point favourite. To win this game the underdog would need to produce a result at least 1.35 standard deviations (of 37 points) from the mean. In comparison, for a game with a 20-point favourite the underdog would need a result just over 0.5 standard deviations from the mean to produce the upset result. The latter is far more likely - about 20% points more likely using the relevant Normal distribution. As such, a perfect forecaster becomes significantly more likely to snag the 10 points on offer for selecting the winning team when he or she selects a 50-point favourite compared to when he or she takes the 20-point favourite on offer.

Another notable feature of the charts is that the increase in the expected score as the level of favouritism increases (as measured by the expected margin in points) is more linear in the -25 to +25 point range for the Normal Competition than it is for the Probabilistic Competition.

The results for the other Competition, the Gaussian, are particularly interesting. Firstly, as noted earlier, it appears that a forecaster's expected score is a constant, and that it is maximised if he or she chooses a standard deviation that is equal to the true value (here 37 points). Also, choosing standard deviations that are too small is penalised by the Gaussian's scoring far more than choosing those that are too large.

To see this, compare the lines in the chart for standard deviations of 33 and 41 points, both of which are in error by 4 points in absolute terms, and compare also the lines for 30 and 44 points, which are both 7 points in error on the same basis. In both cases, the forecaster nominating the larger standard deviation can expect to score better than the forecaster nominating the smaller one.

Now I should point out that this result depends on my assumption of heteroskedasticity, that the spread of margins about their expected value is the same same regardless of the size of that margin. I don't feel that I've yet proven this assumption beyond reasonable doubt here on MoS (see, for example, this post from 2013 and this post from 2014), but I'd say that the current weight of evidence is that it's at least approximately true. Enough to prove a civil if not a criminal case.

The result also depends on the value of the standard deviation that I assume, which I've variously estimated in earlier blogs as being empirically around 38 and theoretically most often in the 33 to 40 point range. In the charts above I chose a value of 37; for those below I used 35 instead.

The expected scores in the Normal and Probabilistic Competitions change by only a little when we reduce the value of the true standard deviation. In the Probabilistic Competition the expected scores change because the victory probability associated with a given true margin increases with a smaller true standard deviation. A team that's expected to win by 25 points, for example, is a 75.0% favourite when the true standard deviation is 37, but a 76.2% favourite when the true standard deviation is 35.

Normal Competition scores increase for the same underlying reason - essentially because favourites of a given level measured in terms of points, win more often when the true standard deviation is smaller, which means that the perfect forecaster bags the 10 points more often for selecting the winner, and because final scores tend to be less spread and so more closely clustered around the true value, which means that the perfect forecaster tends to bag more bonus points too. The increases in expected scores with reduced standard deviation are very small, however.

In the Gaussian Competition we still find, as we'd expect, that nominating the true standard deviation (now 35) produces the highest expected scores. That score is now about 1.825, up from about 1.74 when the true standard deviation was 37 points. Here too we find that, for a given absolute difference between the true and nominated standard deviation, nominating standard deviations that are too low is penalised more than nominating those that are too high.

CONCLUSION

In this blog we've found that, for a "perfect" forecaster who knows the true expected final margin and probability of victory for the favourite:

His or her score in the Probabilistic and Normal Competitions increases as the favourite's expected margin increases.
His or her score in the Gaussian Competition is a constant, the value of which depends on the true standard deviation and how far his or her nominated standard deviation is, in absolute terms, from the true standard deviation
Nominating a standard deviation that is below the true standard deviation lowers a forecaster's expected score in the Gaussian Competition by more than if he or she nominated a standard deviation above the true standard deviation by the same amount

These results are based on the assumption that actual final margins follow a Normal Distribution with a mean equal to the expected game margin and with a fixed standard deviation. The results also assume that the standard deviation of game margins about their expected value is a constant across all possible game margins.

If we use last year's Bookmaker pre-game prices as a guide to the true team probabilities for each game and convert these probabilities to expected margins using a standard deviation of 37, we find that the average per-game scores of a perfect forecaster nominating a standard deviation of 37 in every game would have been:

Normal Competition: 9.13 points per game
Probabilistic Competition: 0.214 bits per game
Gaussian Competition: 1.74 points per game