Using Expected Wins to Rank Scottish Premiership Teams

A random Forrest (Image: SNS Group)

A random Forrest (Image: SNS Group)

Written by Seth Dobson @fitbametrics


Expected wins is a widely used concept in sports analytics thanks to the popularity of Bill James’ Pythagorean Expectation (PE). PE is a formula used to estimate the number of games a Baseball team “should” have won, based on the total number of runs scored and conceded by that team over a season.

The basic idea behind PE is to use runs to estimate team quality instead of wins because runs occur more frequently than wins. This means that run totals, and thus expected wins, are less affected by chance than actual wins because of larger sample size.

PE has been refined over the years and adapted to other sports, including American football, basketball, and football (soccer). Howard Hamilton and Martin Eastwood have done the most extensive work on PE in football. Similar to James’ original formulation, the football version of PE uses total goals scored and conceded over a season to estimate expected wins for each team.

With all due to respect to Pythagoras and Bill James, I’d like to take a different approach to estimating expected wins in the Scottish Premiership.


In this section, I will briefly describe how I estimate a team’s Expected Win Percentage (xWP). You can learn more about the methodology by following me on Twitter @fitbametrics.

I used a machine learning technique called Random Forests to estimate win probabilities at the match level for home and away teams based on 1,140 fixtures from the past five seasons of the Scottish Premiership (data from The following six input variables were used.

  1. Total Shots (Home)

  2. Total Shots (Away)

  3. Total Shots on Target (Home)

  4. Total Shots on Target (Away)

  5. Home Team

  6. Away Team

I built two separate models to generate home and away win probabilities. These probabilities were summed to get the total expected wins for each team per season, and then divided by the number of matches to get xWP.

Figure 1 shows the distribution of xWP values over the last 5 seasons. As you can see, there are two humps in the distribution. The small hump on the right side of the distribution is all Celtic. The large hump on the left side of the distribution is pretty much everybody else.




Before committing to a new metric to assess team quality, the following three criteria should be met.

  1. The metric should be correlated with winning within a season.

  2. The metric should be correlated with itself from one season to the next.

  3. The metric should be correlated with future results.

The first criterion relates to the relevance of the new metric; if it’s not correlated with winning, why should we care (I’m looking at you pass completion %)? The second criterion relates to the metric’s reliability. To be considered a reliable measure of team quality, a metric should be relatively stable over time. This is called repeatability. The final criterion is the ultimate test of a metric’s usefulness, its predictive power. Is the metric a good predictor of future wins?

To test the validity of xWP with regard to the aforementioned criteria, I calculated the coefficient of determination or r2 (“r-squared”). Values of r2 range from 0 to 1, with 1 representing a perfect correlation between two variables. In my experience, 0.70 or greater is generally considered a very good r2. However, one of the main assumptions of using r2 is that the relationship between the two variables is linear. So in addition to calculating r2, I also plotted the data to examine the assumption of linearity.

To further assess the usefulness of xWP, I compared the new metric to Goals Ratio (GR), Total Shots Ratio (TSR), and Shots on Target Ratio (SoTR) with regard to the same validity criteria. These more traditional metrics are not widely used as team ratings anymore. But they are often treated as benchmarks for comparisons with new more advanced metrics, such as Expected Goals (xG).

Figure 2 shows the correlation between xWP and actual win percentage (WP). The r2 value is impressive compared to the r2 values for TSR and SoTR. However, GR has a higher correlation with WP than xWP does.



Figure 3 shows the correlation of xWP from one season to the next. Again, the r2 value is very good, much better than between-season correlations for TSR, SoTR, or GR.



It is also important to note that xWP has a much higher correlation with itself than actual WP does (r2 = 0.62).

Lastly, Figure 4 shows the correlation between xWP in the current season with WP in the subsequent season. Once more, the r2 value is very good, and all of the traditional metrics have lower r2 values than xWP when used as predictors of future WP.




Now that we’ve established that xWP is relevant to winning, and a reliable predictor of future performance, let’s take a look at how the 2017/18 Scottish Premiership teams stack up.

For those who believe “the table never lies”, Figure 5 might come as a bit of a surprise. But that’s exactly the point of football analytics, to challenge the conventional wisdom. Just remember that xWP is a better predictor of future WP than WP itself. This means that my xWP rankings are more reliable than the table-based rankings, at least from the standpoint of measuring team quality.



I would like to draw your attention to the the error bars in Figure 5, which are +/- one standard deviation around the xWP value for each team. The error bars represent uncertainty in the metric and provide a reality check on the rankings.

All metrics in football analytics are estimated with some uncertainty, especially xG. But this uncertainty is almost never reported or even explicitly acknowledged.

Given the substantial amount of uncertainty in the xWP estimates depicted in Figure 5, here is what we can say with reasonable confidence about team quality in the Scottish Premiership this season.

  1. Celtic were significantly better than every other team in the Premiership.

  2. Aberdeen were better than every other team except Celtic and possibly Rangers.

  3. Rangers were better than Hearts and every other team in the bottom five.

  4. Motherwell, Killie, Hibs, and St Johnstone were of similar quality, and better than Thistle.

  5. Ross County were not much worse than any other team in the bottom four.

In my next article for Modern Fitba, I will present research on the difference between WP and xWP, or Residual Win Percentage (rWP), and show how that can help us to predict which teams are likely to regress or bounce-back next season.

Thanks for reading!