May 02, 2012

1897 to 2011 : Goals, Behinds, Scoring Accuracy & Winning Team Results By Quarter

May 02, 2012/ Tony Corke

We've looked before at the topic of which quarter winning teams win. Today, first up, we'll revisit this issue in more depth.

April 12, 2012

The Increased Importance of Predicting Away Team Scores

April 12, 2012/ Tony Corke

In an earlier blog we found that the score of the Home team carried more information about the final game margin than did the score of the Away team. One way of interpreting this fact is that, given the choice between improving your prediction of the Home team score or your prediction of the Away team score, you should opt for the former if your goal is to predict the final game margin. While that's true, it turns out that it's less true now than it once was.

April 02, 2012

Finding Non-Linear Relationships Between AFL Variables : The MINER Package

April 02, 2012/ Tony Corke

It's easy enough to determine whether or not one continuous variable has a linear relationship with another, and how strong that relationship is, by calculating the Pearson product-moment correlation coefficient for the two variables. A value near +1 for this coefficient indicates a strong, positive linear relationship between the variables in question, so that high values of one tend to coincide with high values of the other, and vice versa for low values; a value near -1 indicates a strong, negative linear relationship; and a value of 0 indicates a lack of any linear relationship at all. But what if we want to assess more generally if there's a relationship between two variables, linear or otherwise, and we don't know the exact form that this relationship takes? That's the purpose for which the Maximal Information Coefficient (MIC) was created, and recently made available in an R package called MINER.

January 31, 2012

A Well-Calibrated Model

January 31, 2012/ Tony Corke

It's nice to come up with a new twist on an old idea. This year, in reviewing the relative advantages and disadvantages conferred on each team by the draw, I want to do it a little differently. Specifically, I want to estimate these effects by measuring the proportion of games that I expect each team will win given their actual draw compared to the proportion I'd expect them to win if they played every team twice (yes, that hoary old chestnut in a different guise - that isn't the 'new' bit).

August 18, 2011

The 2011 Performance of the MARS, Colley and Massey Ratings Systems

August 18, 2011/ Tony Corke

I was curious - and that's rarely a portent of lazy evenings - as to which of the three ratings systems we've been tracking since Round 14 is best, so I set about finding out.

July 24, 2011

An Empirical Review of the Favourite In-Running Model

July 24, 2011/ Tony Corke

In the previous blog we reviewed a series of binary logits that modelled a favourite's probability of victory given its pre-game bookmaker-assessed head-to-head probability and its lead at the end of a particular quarter. There I provided just a single indication of the quality of those models: the accuracy with which they correctly predicted the final result of the game. That's a crude and very broad measure. In this blog we'll take a closer look at the empirical model fits to investigate their performance in games with different leads and probabilities.

July 19, 2011

Hanging Onto a Favourite: Assessing a Favourite's In-Running Chances of Victory

July 19, 2011/ Tony Corke

Over the weekend I was paying particular attention to the in-running odds being offered on various games and remain convinced that punters overestimate the probability of the favourite ultimately winning, especially when the favourite trails.

April 29, 2011

A Little Behind in the Scoring

April 29, 2011/ Tony Corke

We've not had a proposition bet for a while, so here's a new one for you. We're going to pick a large number of games at random and, based on the half-time score in each, I'll pledge to bet on the team that's scored the greater number of behinds whether they be the raging favourite or the deserving underdog. It both teams have scored the same number of behinds at the main break or if the game ends in a draw, the bet is a push and neither of us need reach into our pockets. Otherwise, I collect if the team that had scored the greater number of behinds at the half goes on win, and you win if the team that had scored the lesser number of behinds at the half goes on win. Simple.

March 20, 2011

Tipping Without Market Price Information

March 20, 2011/ Tony Corke

In a previous blog I looked at the notion of momentum and found that Richmond, St Kilda, Melbourne and Geelong all seemed to be "momentum" teams in that their likelihood of winning a game seemed to be disproportionately affected by whether they'd won or lost their previous match.

February 25, 2011

Assessing ProPred's, WinPred's and the Bookie's Probability Forecasts

February 25, 2011/ Tony Corke

Almost 12 months ago, in this blog, I introduced the topic of probability scoring as a basis on which to assess the forecasting performance of a probabilistic tipster. Unfortunately, I used it for the remainder of last season as a means of assessing the ill-fated HELP algorithm, which didn't so much need a probability score to measure its awfullness as it did a stenchometer. As a consequence I think I'd mentally tainted the measure, but it deserves another run with another algorithm.

February 04, 2011

Why You Should Have Genes in Your Ensemble

February 04, 2011/ Tony Corke

Over on the MAFL Wagers & Tips blog I've been introducing the updated versions of the Heuristics, in this post and in this post. I've shown there that these heuristics are, individually, at least moderately adept at predicting historical AFL outcomes. All told, there are eleven heuristics, comfortably enough to form an ensemble, so in the spirit of the previous entry in MAFL Statistical Analyses, the question must be asked: can I find a subset of the heuristics which, collectively, using a majority voting scheme, tips better than any one of them alone?

December 30, 2010

Home Ground Advantage: Fans and Familiarity

December 30, 2010/ Tony Corke

In AFL, playing at home is a distinct advantage, albeit perhaps a little less of an advantage than it once was. So, around this time of year, I usually spend a few days agonising over the allocation of home team status for each game in the upcoming season.

November 24, 2010

Picking Winners - A Deeper Dive

November 24, 2010/ Tony Corke

Last blog I identified a banker's dozen of algorithms that I thought were worthy of further consideration for Fund honours next season.

Experience has taught me that, behind the attractive veneer of some models with impressive historical ROIs often lurk troubling pathologies. One form of that pathology is exhibited by models with returns that come mostly from a handful of bets, one or two of them especially fortuitous. Another manifests as a 'bet large, bet often' approach that would subject any human on the business end of such wagering to the punting equivalent of a ride on The Big Dipper that's just as likely to end with you 100 metres above the ground as 200 metres below it. The question to be answered in this blog then is: do any of the 11 algorithms I've identified this time show any such characteristics?

November 17, 2010

Can We Do Better Than The Binary Logit?

November 17, 2010/ Tony Corke

To say that there's a 'bit in this blog' is like declaring the 100 year war 'a bit of a skirmish'.

I'll start by broadly explaining what I've done. In a previous blog I constructed 12 models, each attempting to predict the winner of an AFL game. The 12 models varied in two ways, firstly in terms of how the winning team was described ...

October 29, 2010

Why It Matters Which Team Wins

October 29, 2010/ Tony Corke

In conversation - and in interrogation, come to think of it - the key to getting a good answer is often in the framing of the question.

So too in statistical modelling, where one common method for asking a slightly different question of the data is to take the variables you have and transform them.

Consider for example the following results for four binary logits, each built to provide an answer to the question 'Under what circumstances does the team with the higher MARS Rating tend to win?'.

September 18, 2010

Visualising AFL Grand Final History

September 18, 2010/ Tony Corke

I'm getting in early with the Grand Final postings.

The diagram below summarises the results of all 111 Grand Finals in history, excluding the drawn Grand Finals of 1948 and 1977, and encodes information in the following ways:

Each circle represents a team. Teams can appear once or twice (or not at all) - as a red circle as Grand Final losers and as a green circle as Grand Final winners.
Circle size if proportional to frequency. So, for example, a big red circle, such as Collingwood's denotes a team that has lost a lot of Grand Finals.
Arrows join Grand Finalists and emanate from the winning team and terminate at the losing team. The wider the arrow, the more common the result.

No information is encoded in the fact that some lines are solid and some are dashed. I've just done that in an attempt to improve legibility. (You can get a PDF of this diagram here, which should be a little easier to read.)

I've chosen not to amalgamate the records of Fitzroy and the Lions, Sydney and South Melbourne, or Footscray and the Dogs (though this last decision, I'll admit, is harder to detect). I have though amalgamated the records of North Melbourne and the Roos since, to my mind, the difference there is one of name only.

The diagram rewards scrutiny. I'll just leave you with a few things that stood out for me:

Seventeen different teams have been Grand Final winners; sixteen have been Grand Final losers
Wins have been slightly more equitably shared around than losses: eight teams have pea-sized or larger green circles (Carlton, Collingwood, Essendon, Hawthorn, Melbourne, Richmond, Geelong and Fitzroy), six have red circles of similar magnitude (Collingwood, South Melbourne, Richmond, Carlton, Geelong and Essendon).
I recognise that my vegetable-based metric is inherently imprecise and dependent on where you buy your produce and whether it's fresh or frozen, but I feel that my point still stands.
You can almost feel the pain radiating from those red circles for the Pies, Dons and Blues. Pies fans don't even have the salve of a green circle of anything approaching compensatory magnitude.
Many results are once-only results, with the notable exceptions being Richmond's dominance over the Blues, the Pies' over Richmond, and the Blues over the Pies (who knew - football Grand Final results are intransitive?), as well as Melbourne's over the Dons and the Pies.

As I write this, the Saints v Dogs game has yet to be played, so we don't know who'll face Collingwood in the Grand Final.

If it turns out to be a Pies v Dogs Grand Final then we'll have nothing to go on, since these two teams have not previously met in a Grand Final, not even if we allow Footscray to stand-in for the Dogs.

A Pies v Saints Grand Final is only slightly less unprecedented. They've met once before in a Grand Final when the Saints were victorious by one point in 1966.

September 14, 2010

All You Ever Wanted to Know About Favourite-Longshot Bias ...

September 14, 2010/ Tony Corke

Previously, on at least a few occasions, I've looked at the topic of the Favourite-Longshot Bias and whether or not it exists in the TAB Sportsbet wagering markets for AFL.

A Favourite-Longshot Bias (FLB) is said to exist when favourites win at a rate in excess of their price-implied probability and longshots win at a rate less than their price-implied probability. So if, for example, teams priced at $10 - ignoring the vig for now - win at a rate of just 1 time in 15, this would be evidence for a bias against longshots. In addition, if teams priced at $1.10 won, say, 99% of the time, this would be evidence for a bias towards favourites.

When I've considered this topic in the past I've generally produced tables such as the following, which are highly suggestive of the existence of such an FLB.

Each row of this table, which is based on all games from 2006 to the present, corresponds to the results for teams with price-implied probabilities in a given range. The first row, for example, is for all those teams whose price-implied probability was less than 10%. This equates, roughly, to teams priced at $9.50 or more. The average implied probability for these teams has been 9%, yet they've won at a rate of only 4%, less than one-half of their 'expected' rate of victory.

As you move down the table you need to arrive at the second-last row before you come to one where the win rate exceed the expected rate (ie the average implied probability). That's fairly compelling evidence for an FLB.

This empirical analysis is interesting as far as it goes, but we need a more rigorous statistical approach if we're to take it much further. And heck, one of the things I do for a living is build statistical models, so you'd think that by now I might have thrown such a model at the topic ...

A bit of poking around on the net uncovered this paper which proposes an eminently suitable modelling approach, using what are called conditional logit models.

In this formulation we seek to explain a team's winning rate purely as a function of (the natural log of) its price-implied probability. There's only one parameter to fit in such a model and its value tells us whether or not there's evidence for an FLB: if it's greater than 1 then there is evidence for an FLB, and the larger it is the more pronounced is the bias.

When we fit this model to the data for the period 2006 to 2010 the fitted value of the parameter is 1.06, which provides evidence for a moderate level of FLB. The following table gives you some idea of the size and nature of the bias.

2010 - Favourite-Longshot Bias - Conditional Logit.png

The first row applies to those teams whose price-implied probability of victory is 10%. A fair-value price for such teams would be $10 but, with a 6% vig applied, these teams would carry a market price of around $9.40. The modelled win rate for these teams is just 9%, which is slightly less than their implied probability. So, even if you were able to bet on these teams at their fair-value price of $10, you'd lose money in the long run. Because, instead, you can only bet on them at $9.40 or thereabouts, in reality you lose even more - about 16c in the dollar, as the last column shows.

We need to move all the way down to the row for teams with 60% implied probabilities before we reach a row where the modelled win rate exceeds the implied probability. The excess is not, regrettably, enough to overcome the vig, which is why the rightmost entry for this row is also negative - as, indeed, it is for every other row underneath the 60% row.

Conclusion: there has been an FLB on the TAB Sportsbet market for AFL across the period 2006-2010, but it hasn't been generally exploitable (at least to level-stake wagering).

The modelling approach I've adopted also allows us to consider subsets of the data to see if there's any evidence for an FLB in those subsets.

I've looked firstly at the evidence for FLB considering just one season at a time, then considering only particular rounds across the five seasons.

2010 - Favourite-Longshot Bias - Year and Round.png

So, there is evidence for an FLB for every season except 2007. For that season there's evidence of a reverse FLB, which means that longshots won more often than they were expected to and favourites won less often. In fact, in that season, the modelled success rate of teams with implied probabilities of 20% or less was sufficiently high to overcome the vig and make wagering on them a profitable strategy.

That year aside, 2010 has been the year with the smallest FLB. One way to interpret this is as evidence for an increasing level of sophistication in the TAB Sportsbet wagering market, from punters or the bookie, or both. Let's hope not.

Turning next to a consideration of portions of the season, we can see that there's tended to be a very mild reverse FLB through rounds 1 to 6, a mild to strong FLB across rounds 7 to 16, a mild reverse FLB for the last 6 rounds of the season and a huge FLB in the finals. There's a reminder in that for all punters: longshots rarely win finals.

Lastly, I considered a few more subsets, and found:

No evidence of an FLB in games that are interstate clashes (fitted parameter = 0.994)
Mild evidence of an FLB in games that are not interstate clashes (fitted parameter = 1.03)
Mild to moderate evidence of an FLB in games where there is a home team (fitted parameter = 1.07)
Mild to moderate evidence of a reverse FLB in games where there is no home team (fitted parameter = 0.945)

FLB: done.

June 08, 2010

In-Running Wagering: What's the Best Strategy?

June 08, 2010/ Tony Corke

With services such as Betfair now offering in-running wagering opportunities, the ability to accurately assess a team's chances of victory at any given point in a game is now of considerable commercial value. Imagine, for example, that your team, who are at home, lead by 18 points at the first change. Would a wager on them at $1.40 be advised?

June 04, 2010

In-Running Prediction of the Winner of an AFL Game

June 04, 2010/ Tony Corke

I've been planning to create this model for a while. With it, you can calculate the probability that the home team will eventually prevail given the state of the match at a particular point in the game.

May 17, 2010

Is the Home Ground Advantage Disappearing?

May 17, 2010/ Tony Corke

Home teams, as a whole, have not fared particularly well this season, which is one of the reasons that most of the MAFL Funds - all but one of which bet exclusively on home teams - have been testing Investors' patience.

Statistical Analyses