September 05, 2015

Can We Use Head-to-Head Market Movements in the Line Market?

September 05, 2015/ Tony Corke

Yesterday's post led to an interesting Twitter thread last evening, which included a suggestion to reanalyse the data to determine whether price movements in the Pinnacle head-to-head market might have predictive value in other markets for the same game, specifically in the line market.

May 09, 2015

Same Head-to-Head Prices, Different Lines

May 09, 2015/ Tony Corke

I've raised an eyebrow or two more than once when I've seen the TAB bookmaker post two markets with the same head-to-head prices but different line market handicaps priced at even-money.

November 30, 2014

On Choosing Strong Classifiers for Predicting Line Betting Results

November 30, 2014/ Tony Corke

The themes in this blog have been bouncing around in my thoughts - in virtual and in unpublished blog form - for quite a while now. My formal qualifications are as an Econometrician but many of the models that I find myself using in MoS come from the more recent (though still surprisingly old) Machine Learning (ML) discipline, which I'd characterise as being more concerned with the predictive ability of a model than with its theoretical pedigree. (Breiman wrote a wonderful piece on this topic, entitled Statistical Modelling: The Two Cultures, back in 2005.)

August 29, 2014

SuperMargin Implications? Yes, They Are Atrocious.

August 29, 2014/ Tony Corke

In a recent blog I developed an empirical model of AFL scoring in which I assumed that the Scoring Shots generated by Home and Away teams could be modelled by a bivariate Negative Binomial and that the conversion of these shots into Goals could be modelled by Beta Binomials.

February 21, 2013

Clustering Your Way To Line Betting Success : Building a Predictive Model

February 21, 2013/ Tony Corke

In the previous blog I used a clustering algorithm - Partitioning Around Medoids (PAM) as it happens - to group games that were similar in terms of pre-game TAB Bookmaker odds, the teams' MARS Ratings, and whether or not the game was an Interstate clash. There it turned out that, even though I'd clustered using only pre-game data, the resulting clusters were highly differentiated with respect to the line betting success rates of the Home teams in each cluster.

February 17, 2013

Clustering Your Way To Line Betting Success

February 17, 2013/ Tony Corke

For today's blog I'll be creating a game clustering that uses as input only the information that we might reasonably know pre-game - for example, the pre-game team MARS Ratings, Bookmaker prices (or some metric derived from them), and information about the game venue.

June 09, 2012

Estimating Fair Head-to-Head Prices : Part II

June 09, 2012/ Tony Corke

In the previous blog on this topic I described a way to estimate the vig embedded in the head-to-head prices of both teams

April 28, 2012

Predicting the TAB Sportsbet Margin

April 28, 2012/ Tony Corke

We've shown previously that it's possible to predict the TAB Sportsbet Bookmaker's head-to-head prices to a high level of accuracy using only MARS Ratings, the Interstate Status of a game and information about the very recent form of the Home team.

July 28, 2011

Projecting the Favourite's Final Margin

July 28, 2011/ Tony Corke

In a couple of earlier blogs I created binary logit models to predict the probability that the favourite would win given a specified lead at a quarter break and the bookmaker's assessed pre-game probability for the favourite. These models allow you to determine what a fair in-running price would be for the favourite. You might instead want to know what the favourite's projected victory margin is given the same input data, so in this blog I'll be providing some simple linear regressions that provide this information.

June 16, 2011

Line Fund Profitability and Probability Scores

June 16, 2011/ Tony Corke

Over on the Simulations blog as part of a more general investigation into the dynamics of the contest between punter and bookmaker in head-to-head wagering I've looked at the relationship between the probability score attained by the Head-to-Head Fund in each season and its profitability. What I found, among other things, was that the Fund's profitability was related not to the absolute probability score of the Fund algorithm, but to its probability score relative to the bookmaker's.

June 10, 2011

Framing a Line Market: How Hard Can It Be?

June 10, 2011/ Tony Corke

This week, for the first time that I can remember, every line market has moved from its initial pricing on Wednesday at noon. More generally, I've noticed a greater sensitivity in the line market than in the head-to-head market and wondered why that might be the case.

February 22, 2011

The Challenges of Line Betting

February 22, 2011/ Tony Corke

Another brief post tonight, this one on the probability of making a profit on line betting depending on the accuracy of the underlying model used to select the team on which to wager, and on the price obtained for each line bet.

February 13, 2011

Home Team Wagering: Rumours of Its Death Have Been Greatly Exaggerated

February 13, 2011/ Tony Corke

I should probably have noticed this sooner, but last year was quite a profitable year for blindly wagering on Home Teams. A gambler who level-staked the AFL Designated Home Team in every game in the head-to-head and in the line market would have recorded an 8.4% ROI on his or her head-to-head wagers and a 4.1% ROI on his or her line wagers.

January 21, 2011

Ensemble Models for Predicting Binary Events

January 21, 2011/ Tony Corke

I've been following the development of prediction markets with considerable interest over the past few years. These are markets in which the opinions of many engaged experts are combined, the notion being that their combined opinion will be a better predictor of a future outcome than the opinion of any one of them. It's a notion that has proved right on many occasions.

September 26, 2010

The Bias in Line Betting Revisited

September 26, 2010/ Tony Corke

Some blogs almost write themselves. This hasn't been one of them.

It all started when I read a journal article - to which I'd now link if I could find the darned thing again - that suggested a bias in NFL (or maybe it was College Football) spread betting markets arising from bookmakers' tendency to over-correct when a team won on line betting. The authors found that after a team won on line betting one week it was less likely to win on line betting next week because it was forced to overcome too large a handicap.

Naturally, I wondered if this was also true of AFL spread betting.

What Makes a Team's Start Vary from Week-to-Week?

In the process of investigating that question, I wound up wondering about the process of setting handicaps in the first place and what causes a team's handicap to change from one week to the next.

Logically, I reckoned, the start that a team receives could be described by this equation:

Start received by Team A (playing Team B) = (Quality of Team B - Quality of Team A) - Home Status for Team A

In words, the start that a team gets is a function of its quality relative to its opponent's (measured in points) and whether or not it's playing at home. The stronger a team's opponent the larger will be the start, and there'll be a deduction if the team's playing at home. This formulation of start assumes that game venue only ever has a positive effect on one team and not an additional, negative effect on the other. It excludes the possibility that a side might be a P point worse side whenever it plays away from home.

With that as the equation for the start that a team receives, the change in that start from one week to the next can be written as:

Change in Start received by Team A = Change in Quality of Team A + Difference in Quality of Teams played in successive weeks + Change in Home Status for Team A

To use this equation for we need to come up with proxies for as many of the terms that we can. Firstly then, what might a bookie use to reassess the quality of a particular team? An obvious choice is the performance of that team in the previous week relative to the bookie's expectations - which is exactly what the handicap adjusted margin for the previous week measures.

Next, we could define the change in home status as follows:

Change in home status = +1 if a team is playing at home this week and played away or at a neutral venue in the previous week
Change in home status = -1 if a team is playing away or at a neutral venue this week and played at home in the previous week
Change in home status = 0 otherwise

This formulation implies that there's no difference between playing away and playing at a neutral venue. Varying this assumption is something that I might explore in a future blog.

From Theory to Practicality: Fitting a Model

(well, actually there's a bit more theory too ...)

Having identified a way to quantify the change in a team's quality and the change in its home status we can now run a linear regression in which, for simplicity, I've further assumed that home ground advantage is the same for every team.

We get the following result using all home-and-away data for seasons 2006 to 2010:

For a team (designated to be) playing at home in the current week:

Change in start = -2.453 - 0.072 x HAM in Previous Week - 8.241 x Change in Home Status

For a team (designated to be) playing away in the current week:

Change in start = 3.035 - 0.155 x HAM in Previous Week - 8.241 x Change in Home Status

These equations explain about 15.7% of the variability in the change in start and all of the coefficients (except the intercept) are statistically significant at the 1% level or higher.

(You might notice that I've not included any variable to capture the change in opponent quality. Doubtless that variable would explain a significant proportion of the otherwise unexplained variability in change in start but it suffers from the incurable defect of being unmeasurable for previous and for future games. That renders it not especially useful for model fitting or for forecasting purposes.

Whilst that's a shame from the point of view of better modelling the change in teams' start from week-to-week, the good news is that leaving this variable out almost certainly doesn't distort the coefficients for the variables that we have included. Technically, the potential problem we have in leaving out a measure of the change in opponent quality is what's called an omitted variable bias, but such bias disappears if the the variables we have included are uncorrelated with the one we've omitted. I think we can make a reasonable case that the difference in the quality of successive opponents is unlikely to be correlated with a team's HAM in the previous week, and is also unlikely to be correlated with the change in a team's home status.)

Using these equations and historical home status and HAM data, we can calculate that the average (designated) home team receives 8 fewer points start than it did in the previous week, and the average (designated) away team receives 8 points more.

All of which Means Exactly What Now?

Okay, so what do these equations tell us?

Firstly let's consider teams playing at home in the current week. The nature of the AFL draw is such that it's more likely than not that a team playing at home in one week played away in the previous week in which case the Change in Home Status for that team will be +1 and their equation can be rewritten as

Change in Start = -10.694 - 0.072 x HAM in Previous Week

So, the start for a home team will tend to drop by about 10.7 points relative to the start it received in the previous week (because they're at home this week) plus about another 1 point for every 14.5 points lower their HAM was in the previous week. Remember: the more positive the HAM, the larger the margin by which the spread was covered.

Next, let's look at teams playing away in the current week. For them it's more likely than not that they played at home in the previous week in which case the Change in Home Status will be -1 for them and their equation can be rewritten as

Change in Start = 11.276 - 0.155 x HAM in Previous Week

Their start, therefore, will tend to increase by about 11.3 points relative to the previous week (because they're away this week) less 1 point for every 6.5 points lower their HAM was in the previous week.

Away teams, therefore, are penalised more heavily for larger HAMs than are home teams.

This I offer as one source of potential bias, similar to the bias that was found in the original article I read.

Proving the Bias

As a simple way of quantifying any bias I've fitted what's called a binary logit to estimate the following model:

Probability of Winning on Line Betting = f(Result on Line Betting in Previous Week, Start Received, Home Team Status)

This model will detect any bias in line betting results that's due to an over-reaction to the previous week's line betting results, a tendency for teams receiving particular sized starts to win or lose too often, or to a team's home team status.

The result is as follows:

logit(Probability of Winning on Line Betting) = -0.0269 + 0.054 x Previous Line Result + 0.001 x Start Received + 0.124 x Home Team Status

The only coefficient that's statistically significant in that equation is the one on Home Team Status and it's significant at the 5% level. This coefficient is positive, which implies that home teams win on line betting more often than they should.

Using this equation we can quantify how much more often. An away team, we find, has about a 46% probability of winning on line betting, a team playing at a neutral venue has about a 49% probability, and a team playing at home has about a 52% probability.

That is undoubtedly a bias, but I have two pieces of bad news about it. Firstly, it's not large enough to overcome the vig on line betting at $1.90 and secondly, it disappeared in 2010.

Do Margins Behave Like Starts?

We now know something about how the points start given by the TAB Sportsbet bookie responds to a team's change in estimated quality and to a change in a team's home status. Do the actual game margins respond similarly?

One way to find this out is to use exactly the same equation as we used above, replacing Change in Start with Change in Margin and defining the change in a team's margin as its margin of victory this week less its margin of victory last week (where victory margins are negative for losses).

If we do that and run the new regression model, we get the following:

For a team (designated to be) playing at home in the current week:

Change in Margin = 4.058 - 0.865 x HAM in Previous Week + 8.801 x Change in Home Status

For a team (designated to be) playing away in the current week:

Change in Margin = -4.571 - 0.865 x HAM in Previous Week + 8.801 x Change in Home Status

These equations explain an impressive 38.7% of the variability in the change in margin. We can simplify them, as we did for the regression equations for Change in Start, by using the fact that the draw tends to alternate team's home and away status from one week to the next.

So, for home teams:

Change in Margin = 12.859 - 0.865 x HAM in Previous Week

While, for away teams:

Change in Margin = -13.372 - 0.865 x HAM in Previous Week

At first blush it seems a little surprising that a team's HAM in the previous week is negatively correlated with its change in margin. Why should that be the case?

It's a phenomenon that we've discussed before: regression to the mean. What these equations are saying are that teams that perform better than expected in one week - where expectation is measured relative to the line betting handicap - are likely to win by slightly less than they did in the previous week or lose by slightly more.

What's particularly interesting is that home teams and away teams show mean regression to the same degree. The TAB Sportsbet bookie, however, hasn't always behaved as if this was the case.

Another Approach to the Source of the Bias

Bringing the Change in Start and Change in Margin equations together provides another window into the home team bias.

The simplified equations for Change in Start were:

Home Teams: Change in Start = -10.694 - 0.072 x HAM in Previous Week

Away Teams: Change in Start = 11.276 - 0.155 x HAM in Previous Week

So, for teams whose previous HAM was around zero (which is what the average HAM should be), the typical change in start will be around 11 points - a reduction for home teams, and an increase for away teams.

The simplified equations for Change in Margin were:

Home Teams: Change in Margin = 12.859 - 0.865 x HAM in Previous Week

Away Teams: Change in Margin = -13.372 - 0.865 x HAM in Previous Week

So, for teams whose previous HAM was around zero, the typical change in margin will be around 13 points - an increase for home teams, and a decrease for away teams.

Overall the 11-point v 13-point disparity favours home teams since they enjoy the larger margin increase relative to the smaller decrease in start, and it disfavours away teams since they suffer a larger margin decrease relative to the smaller increase in start.

To Conclude

Historically, home teams win on line betting more often than away teams. That means home teams tend to receive too much start and away teams too little.

I've offered two possible reasons for this:

Away teams suffer larger reductions in their handicaps for a given previous weeks' HAM
For teams with near-zero previous week HAMs, starts only adjust by about 11 points when a team's home status changes but margins change by about 13 points. This favours home teams because the increase in their expected margin exceeds the expected decrease in their start, and works against away teams for the opposite reason.

If you've made it this far, my sincere thanks. I reckon your brain's earned a spell; mine certainly has.

July 30, 2010

Line Betting : A Codicil

July 30, 2010/ Tony Corke

While contemplating the result from an earlier blog, which was that home teams had higher handicap-adjusted margins and won at a rate significantly higher than 50% on line betting - virtually regardless of the start they were giving or receiving - I wondered if the source of this anomaly might be that the bookie gives home teams a slightly better deal in setting line margins.

July 29, 2010

A Line Betting Enigma

July 29, 2010/ Tony Corke

The TAB Sportsbet bookmaker is, as you know, a man to be revered and feared in equal measure. Historically, his head-to-head prices have been so exquisitely well-calibrated that I instinctively compare any model I construct with the forecasts he produces. To show that a model historically outperforms leads me to scuttle off to determine what error I've made in constructing the model, what piece of information I've used that, in truth, was only available with the benefit of hindsight.

July 15, 2010

Trialling The Super Smart Model

July 15, 2010/ Tony Corke

The best way to trial a potential Fund algorithm, I'm beginning to appreciate, is to publish each week the forecasts that it makes. This forces me to work through the mechanics of how it would be used in practice and, importantly, to set down what restrictions should be applied to its wagering - for example should it, like most of the current Funds, only bet on Home Teams, and in which round of the season should it start wagering.

July 09, 2010

The Relationship Between Head-to-Head Price and Points Start

July 09, 2010/ Tony Corke

I've found yet another MAFL-related use for the Eureqa tool, this time to determine the precise relationship between a team's head-to-head price and the start it's giving or receiving on line betting. A simple plot of the history of a team's head-to-head price (or the probability that can be inferred from it) versus its start on line betting makes it obvious that there's a relationship between the two and that it's a non-linear one, but in the past I've been constrained by my own (lack of) ingenuity and persistence in generating sufficient possibilities to find its exact nature.

February 18, 2010

Another Day, Another Model

February 18, 2010/ Tony Corke

In the previous blog I developed models for predicting victory margins and found that the selection of a 'best' model depended on the criterion used to measure performance.

This blog I'll review the models that we developed and then describe how I created another model, this one designed to predict line betting winners.

The Low Average Margin Predictor

The model that produced the lowest mean absolute prediction error MAPE was constructed by combining the predictions of two other models. One of the constituent models - which I collectively called floating window models - looked only at the victory margins and bookie's home team prices for the last 22 rounds, and the other constituent model looked at the same data but only for the most recent 35 rounds.

On their own neither of these two models produce especially small MAPEs, but optimally combined they produce an overall model with a 28.999 MAPE across seasons 2008 and 2009 (I know that the three decimal places is far more precision than is warranted, but any rounding's going to nudge it up to 29 which just doesn't have the same ability to impress. I consider it my nod to the retailing industry, which persists in believing that price proximity is not perceived linearly and so, for example, that a computer priced at $999 will be thought meaningfully cheaper than one priced at $1,000).

Those optimal weightings were created in the overall model by calculating the linear combination of the underlying models that would have performed best over the most recent 26 weeks of the competition, and then using those weights for the current week's predictions. These weights will change from week to week as one model or the other tends to perform better at predicting victory margins; that is what gives this model its predictive chops.

This low MAPE model henceforth I shall call the Low Average Margin Predictor (or LAMP, for brevity).

The Half Amazing Margin Predictor

Another model we considered produced margin predictions with a very low median absolute prediction error. It was similar to the LAMP but used four rather than two underlying models: the 19-, 36-, 39- and 52-round floating window models.

It boasted a 22.54 point median absolute prediction error over seasons 2008 and 2009, and its predictions have been within 4 goals of the actual victory margin in a tick over 52% of games. What destroys its mean absolute prediction error is its tendency to produce victory margin predictions that are about as close to the actual result as calcium carbonate is to coagulated milk curd. About once every two-and-a-half rounds one of its predictions will prove to be 12 goals or more distant from the actual game result.

Still, its median absolute prediction error is truly remarkable, which in essence means that its predictions are amazing about half the time, so I shall name it the Half Amazing Margin Predictor (or HAMP, for brevity).

In their own highly specialised ways, LAMP and HAMP are impressive but, like left-handed chess players, their particular specialities don't appear to provide them with any exploitable advantage. To be fair, TAB Sportsbet does field markets on victory margins and it might eventually prove that LAMP or HAMP can be used to make money on these markets, but I don't have the historical data to test this now. I do, however, have line market data that enables me to assess LAMP's and HAMP's ability to make money on this market, and they exhibit no such ability. Being good at predicting margins is different from being good at predicting handicap-adjusted margins.

Nonetheless, I'll be publishing LAMP's and HAMP's margin predictions this season.

HELP, I Need Another Model

Well if we want a model that predicts line market winners we really should build a dedicated model for this, and that's what I'll describe next.

The type of model that we'll build is called a binary logit. These can be used to fit a model to any phenomenon that is binary - that is, two-valued - in nature. You could, for example, fit one to model the characteristics of people who do or don't respond to a marketing campaign. In that case, the binary variable is campaign response. You could also, as I'll do here, fit a binary logit to model the relationship between home team price and whether or not the home team wins on line betting.

Fitting and interpreting such models is a bit more complicated than fitting and interpreting models fitted using the ordinary least squares method, which we covered in the previous blog. For this reason I'll not go into the details of the modelling here. Conceptually though all we're doing is fitting an equation that relates the Home team's head-to-head price with its probability of winning on line betting.

For this modelling exercise I have again created 47 floating window models of the sort I've just described, one model that uses price and line betting result data only the last 6 rounds, another that use the same data for the last 7 rounds, and so on up to one that uses data from the last 52 rounds.

Then, as I did in creating HAMP and LAMP, I looked for the combination of floating window models that best predicts winning line bet teams.

The overall model I found to perform best combines 24 of the 47 floating window models - I'll spare you the Lotto-like list of those models' numbers here. In 2008 this model predicted the line betting winner 57% of the time and in 2009 it predicted 64% of such winners. Combined, that gives it a 61% average across the two seasons. I'll call this model the Highly Evolved Line Predictor (or HELP), the 'highly evolved' part of the name in recognition of the fact that it was selected because of its fitness in predicting line betting winners in the environment that prevailed across the 2008 and 2009 seasons.

Whether HELP will thrive in the new environment of the 2010 season will be interesting to watch, as indeed will be the performance of LAMP and HAMP.

In my previous post I drew the distinction between fitting a model and using it to predict the future and explained that a model can be a good fit to existing data but turn out to be a poor predictor. In that context I mentioned the common statistical practice of fitting a model to one set of data and then measuring its predictive ability on a different set.

HAMP, LAMP and HELP are somewhat odd models in this respect. Certainly, when I've used them to predict they're predicting for games that weren't used in the creation of any of their underlying floating window models. So that's a tick.

They are, however, fitted models in that I generated a large number of potential LAMPs, HAMPs and HELPs, each using a different set of the available floating window models, and then selected those models which best predicted the results of the 2008 and 2009 seasons. Accordingly, it could well be that the superior performance of each of these models can be put down to chance, in which case we'll find that their performances in 2010 will drop to far less impressive levels.

We won't know whether or not we're witnessing such a decline until some way into the coming season but in the meantime we can ponder the basis on which we might justify asserting that the models are not mere chimera.

Recall that each of the floating window models use as predictive variables nothing more than the price of the Home team. The convoluted process of combining different floating window models with time-varying weights for each means that, in essence, the predictions of HAMP, LAMP and HELP are all just sophisticated transformations of one number: the Home team price for the relevant game.

So, for HAMP, LAMP and HELP to be considered anything other than statistical flukes it needs to be the case that:

the TAB Sportsbet bookie's Home team prices are reliable indicators of Home teams' victory margins and line betting success
the association between Home team prices and victory margins, and between Home team prices and line betting outcomes varies in a consistent manner over time
HAMP, LAMP and HELP are constructed in such a way as to effectively model these time-varying relationships

On balance I'd have to say that these conditions are unlikely to be met. Absent the experience gained from running these models live during a fresh season then, there's no way I'd be risking money on any of these models.

Many of the algorithms that support MAFL Funds have been developed in much the same way as I've described in this and the previous blog, though each of them is based on more than a single predictive variable and most of them have been shown to be profitable in testing using previous seasons' data and in real-world wagering.

Regardless, a few seasons of profitability doesn't rule out the possibility that any or all of the MAFL Fund algorithms haven't just been extremely lucky.

That's why I'm not retired ...

Statistical Analyses