Ensemble Models for Predicting Binary Events

I've been following the development of prediction markets with considerable interest over the past few years. These are markets in which the opinions of many engaged experts are combined, the notion being that their combined opinion will be a better predictor of a future outcome than the opinion of any one of them. It's a notion that has proved right on many occasions.

One relatively simple circumstance in which it's possible to provide a mathematical proof of the 'wisdom of crowds' is where we have a number of judges, each making a binary prediction - for example, that team A wins or loses - and we combine those judges' individual votes to come up with a final prediction. The Condorcet Jury Theorem tells us that if each judge has a probability of being correct that is greater than 1/2 and if every judge's vote is statistically independent of all other judge's, then the more judges we have the greater is the probability that their combined vote with be correct. In fact, as the number of judges increases, the probability of their combined prediction being correct tends monotonically to 1.

Now let's get practical and think about how we might apply this result to the challenge of predicting line winners. 

Finding a large - preferably infinite - number of footy 'judges', each of whom is guaranteed to tip line winners at a rate in excess of 50%, is problematic. In practice, the real question is: how many judges do we need such that, when combined, their predictions are correct more than about 53% of the time, which is the success rate that we need to overcome the vig on the $1.90 market price in line betting?

To answer this, it turns out, we need to know two things about the judges: 

  1. How accurate are they each individually (I'll assume they all have the same accuracy rate)?
  2. How often do they both tip correctly? (I'll also assume that this is the same figure for all pairs of judges)

Armed with those assumptions, it's not hard to simulate the combined performance of the ensemble of judges, which is what I've done to produce the following chart.

(As usual, please click on the chart for a larger version).

Each panel in the chart corresponds to a particular number of judges and a particular accuracy rate for each judge. So, for example, the top left panel, which is labelled (3, 0.510) is the chart for 3 judges each of whom tips at a 51% accuracy rate. Within each panel, the rate at which any pair of tipsters co-predict winners runs from left to right (it's the axis labelled 'Both') and the overall accuracy of the ensemble runs from top to bottom (it's the axis labelled 'Rate'). Every point on each chart is based on 200,000 simulated predictions.

In general, an ensemble is more accurate if:

  • it contains more judges
  • each judge is more accurate
  • any pair of judges is less likely to co-predict a winner (statistically, this means that the less correlated are the predictions of any pair of judges, the better is the ensemble. It's the power of asset diversification in prediction form.)

It is possible to build a profitable ensemble of line-result predicting models with only 3 underlying models, but they'd each need to be correct 52.5% of the time or more, and the rate at which they co-predicted winners would need to be less than 35%.

If we could find as many as 11 models with the requisite characteristics, we could profit with an accuracy rate for each model of 'just' 52% and a winner co-prediction rate of less than 35%.

How might one go about creating such an ensemble of models? One approach would be to come up with as many variables as you can think of that could be predictive of line betting results and then build a number of models each using only a subset of those variables. The subsetting should reduce the winner co-prediction rate across models, though ensuring that each resultant model had an acceptably high accuracy rate is likely to be a challenge. 

Nonetheless, it's an interesting approach ... maybe next year.