Quantifying the Bias of Arsenal’s Referees vs. The Rest of the League’s Clubs
By Zach Slaton and DogFace
This post can also be found at http://numbersgameblog.blogspot.com/
Author’s Note: Special thanks to DogFace for his co-authorship on this post. His voluminous data set, unending patience, invaluable insight and contribution, and constant editorial feedback throughout the creation of this article was invaluable. He’s a wonderful blogging partner with whom any Gunner or statistician would be lucky to work.
In my first post in this series on Numbers Game blog I used DogFace’s match data to explain how Phil Dowd is the least desirable referee for Arsenal as he not only shows the most biased officiating in terms of fouls, yellow cards, and red cards, but he also shows the largest effect on Arsenal’s likelihood of winning a match. That analysis focused on the effects of all of Arsenal’s referees, but did not quantify how those referees officiated other teams’ matches. This post contains such an analysis, and the results are very interesting.
To aid in such an analysis, a binary logistic regression (BLR) model was created for each team’s likelihood of winning a match based upon a number of factors. Each BLR model includes terms that capture the effects of venue (home/away) and differentials of shots, shots-on-goal, corners, fouls, and fantasy points for yellow and red cards. Not every term was significant for each team – terms that had a p-value of 0.10 or less were eliminated from the team’s BLR and their coefficient for that term is set to zero.
I know that may upset some stats geeks who would prefer p-values of <= 0.05, but the reduced sample size requires a lower p-value threshold or very few teams would have any significant terms. The resultant BLR’s allow a construction of each team’s odds of winning each match, and a study was constructed as to how each official impacts the odds of winning each match based upon their officiating versus the expected average officiating from the club’s total number of matches.
Background on the Data Used in the Analysis
Before diving in to any analysis, a few statements on the types of data used are in order.
The intuitive way to model bias in a referee would be from the fouls per booking and bookings per match figures – although DogFace has often found these to be counter intuitive in that, if a bias exists, it can be expressed within the game in different ways – these often depend on the styles of play of the teams involved.
The noise in these figures is further increased by the fact that the official statistics only reflect what the referee deemed to be a foul rather than the reality of a foul (according to the laws) and/or the consistency of the referee’s interpretation thereof. “Sins of omission”, where a referee should have called a foul but for some reason or another did not, are tough to quantify in an analysis that uses such statistics.
To combat this effect, Untold Arsenal utilises a professional referee (Walter Broeckx) to analyse matches and it is clear that what is recorded in the official statistics is often way off the mark. DogFace’s own data sets confirm this effect. The rise in popularity of statistical analysis in football has led him to notice a trend for parity in the fouls per booking figures that suggests that the referee is aware of his statistics in game – it could be said that this ‘trial by media’ has created the bias we see in these figures and that a referee himself has a conflict of interest in every call he makes or indeed does not make.
The ideal situation would be an independent body to record/analyse referee performance and provide an open source database for us to comb through though – this data would include the most important data of all i.e. the standard ‘human’ error. Alas, we do not have such research based on an independent body, and instead you will have to make do with the work of humble bloggers like Walter Broeckx, DogFace and me.
Quantifying How Arsenal’s Referees Officiate the Rest of the League
Just like the last post, each team’s average foul and fantasy point total was calculated. These variables represent the match attributes directly controlled by the referee. Each team’s actual odds of winning each match were compared to the odds that would be realized if the referee had given the team’s average foul or fantasy point differentials (different values were used based upon the team’s averages for home and away matches). This allows a calculation of an odds differential for each match.
Matches were then grouped by referee to allow an analysis of each referee’s overall bias vs. the average odds. In this case, the same referees as the first post were examined – Atkinson, Bennett, Dean, Dowd, Foy, Halsey, Webb, and Wiley – as they had officiated the greatest number of Arsenal matches and in general a large number of matches overall. Most importantly, they have officiated a range of home and away matches so the impact of home/away bias is minimized.
Unfortunately, the limitations of the number of matches in a season and the availability of each of the referees prevents an analysis where each referee officiates an equal number of home and away matches, but the statistics used in the analysis take the effect of venue into account to minimize its effect on match outcome. The data used for this analysis was also isolated to the 2006/07 through 2009/10 seasons to ensure an adequate number of samples were available from each referee.
It should be noted that for the first part of the analysis, referee data from all four seasons was grouped together. This allows us to look at gross bias throughout the seasons, and keeps a pretty high sample size for the analysis. Later in the post key referees’ data is broken out across seasons to show the effects of referee by season.
Such a study by season helps us see any bias that may be dependent on club’s objectives (Premier League title, European competition qualification, avoiding relegation) and current quality of play (above, at, or below form). Inevitably, such seasonal analysis suffers from limited sample size and is saved for those referees whose overall bias is the most noteworthy.
One further adjustment to the data set had to be made. When creating BLR’s for each team, several of them showed no impact to their odds of winning a match due to foul or fantasy point differentials. That is to say that none of the two coefficients for those terms in a team’s BLR were statistically significant – the coefficient’s p-value was greater than 0.10.
This means that we can’t evaluate the impact of officiating on those teams’ likelihoods of winning a match. Thus, teams that did not have a significant BLR term from the two associated with officiating bias – fouls or fantasy points – were eliminated from the analysis as they’d provide a zero impact and generate misleading results.
A general linear model (GLM) using referee and team officiated (Arsenal vs. not Arsenal) was created from the reduced data set to observe the resultant interactions. The results of the analysis are represented in the graph below which shows each referee’s average odds differential for Arsenal and the rest of the teams. The two plots in the graph show the same data in two different manners. The key plot is the one in the lower left.
Before diving in to the plot, readers with a keen eye will note that the red line, which represents Arsenal’s average odds differential by referee, is a bit different from a similar line plotted in my previous post. This is due to the two different ways the data is analyzed. In the previous case, I was looking at the referee’s performance as a function of just Arsenal’s matches by year. That led to the grouping and averaging to be a bit different than this analysis, which simply looked at Arsenal versus the rest of the league regardless of season within the four analyzed. This latest analysis essentially ignores the effects of time.
Looking at the lower left hand corner of the plot, it’s clear that Dowd shows the largest gap between how he officiates the average match and how he calls an Arsenal match – about a 1.5% penalty for Arsenal. However, what’s different about this analysis is that it also clearly shows that Howard Webb is pretty biased too – also nearly a 1.5% penalty for Arsenal. Rounding out the top three are Wiley, Halsey, and Bennett each with a nearly 1% penalty for Arsenal. Atkinson and Foy seem to show the smallest gaps.
Another way to look at in-match bias is to look for odd patterns in fouls-per-booking. A plot of four factors – Referee, Arsenal/Not Arsenal, Season, and Home/Away – and their impact on fouls-per-booking is shown below. Values on the left or right side of the plot represent the fouls per booking for the plots across that row. Values above and below the graph represent different levels of each factor expressed as text in one of the boxes in each column. The legends to the right of the graph explain what each color/shape combination represents in each row. Thus, if one wants to understand how each referee’s average fouls per booking for Arsenal compares to their fouls per booking for the rest of the league, one can look to the square in the 2nd row/1st column or the 1st row/2nd column. Each of those two plots shows the same data, but is each plotted in a different manner.
Looking at the plot of data in the 2nd row/1st column, one sees there is only one referee that has a higher fouls per booking for Arsenal than the rest of the league – Steve Bennett. Yet again, Howard Webb and Phil Dowd lead the pack with their differential of fouls per booking for Arsenal versus the greater numbers of fouls-per-booking they allow for the rest of the league. Surprisingly, the favorable Chris Foy also has a large gap between how he calls Arsenal matches vs. the rest of the league.
What’s especially disturbing is the fact that Dowd ends up with such a bias against Arsenal in fouls-per-booking given his disproportionate number of home matches he’s officiated for Arsenal (7 home matches to 2 away matches). Take a look at the plot in the lower left, which shows home (red) and away (black) fouls-per-booking by referee.
Overall, Phil Dowd shows one of the largest gaps in favor of a higher fouls-per-booking average home versus away – nearly two fouls per booking. Yet such an advantage is not showing up in Arsenal’s numbers when Dowd is officiating their home matches. In fact, when looking for statistically significant factors in the GLM of fouls-per-booking vs. referee, club, season, and venue, the interaction of referee and venue is the only statistically significant term! Some referees are being consistent between home and away matches, while others are showing large gaps.
Comparison with Pre-Match Expectations via the Asian Handicap Swing
Does this comport with pre-match expectations? One loose measure of such expectations is the Asian Handicap assigned to a match. Luckily, DogFace has also been recording this statistic for each match. DogFace has also introduced the concept of the Asian Handicap Swing (AH swing) to readers of Untold Arsenal. The AH Swing is computed via the following equation:
AH Swing = Actual Goal Differential – Asian Handicap
The AH Swing represents the performance or deviation against the handicap in actual goal difference. The betting line data (or Asian Handicap) is an average calculated from around 30-50 bookmakers across Europe and Asia. We assume the average bookie handicap does not use corruption/bias as a significant factor in the markets – not only for the reasons stated below but also because referee driven match fixing would only affect the markets significantly in terms of an ‘upset’ against the odds.
If we were to take the stance that the betting line reflects referee corruption/bias then any deviation from that line in terms of an over/under performance would be understated i.e. we could say that any ‘noise’ in the original handicap from the bookies would actually understate the bias we are attempting to model. Luckily previous research does not indicate this case, but rather indicates that on the whole bookmakers may be creating a market that is essentially based on efficiency and competition. At the least, they are essentially hedging a gestalt metaphysical abstraction based on media disinformation and the credulous belief that “it all evens out at the end of the day”. However it is worth considering that, as perception of bias in referee performance becomes more ‘main stream’ and paradigms shift, we will see this effect in reflected in the markets of the future.
This post will instead focus on the reality of the results as it is a far more revealing stance to take. The Asian Handicap line is one that reflects, more or less, the illusion of an uncorrupted market. The swing from that line allows for an examination that a corrupted, or biased, market exists.
Just like odds differential, the AH swing has been calculated for each and every match in the database. A plot of four factors – Referee, Arsenal/Not Arsenal, Season, and Home/Away is shown below. Values on the left or right side of the plot represent the Asian Swing for the plots across that row; values above and below the graph represent different levels of each factor expressed as text in one of the boxes in each column. The legends to the right of the graph explain what each colour/shape combination represents in each row. Thus, if one wants to understand how each referee’s average AH swing for Arsenal compares to their AH swing for the rest of the league, one can look to the square in the 2nd row/1st column or the 1st row/2nd column. Each of those two plots shows the same data, but plotted in a different manner.
Looking at the plot in the 2nd row/1st column, it is shown that Mike Dean, Howard Webb, and Phil Dowd have the lowest average values for Arsenal’s AH Swing. This means they are consistently officiating Arsenal’s matches tighter than the other referee’s. Unlike the analysis of fouls-per-booking, the interaction between referee and venue is not statistically significant. Referees consistently provide a significant AH swing advantage to home teams compared to away teams.
It has now become clear that on multiple fronts that Dean, Dowd, and Webb appear to be the most biased against Arsenal. To get an even better understanding we must look at how each of the three officials impact each team and compare those matches to how bookies might expect them to turn out.
Quantifying How Arsenal’s Referees Officiate the Rest of the League
One way to visualize whether or not Arsenal is the most penalized when it comes to the officiating of Dean, Dowd, and Webb is to look at how each team’s average odds differential compares to their average Asian handicap swing when these three officials are present. The three graphs below represent just such a comparison for each referee. The x-axis represents the average Asian handicap swing. The y-axis represents the average odds differential.
The key to the graph is the lower left hand quadrant. Teams that end up there, especially those that end up further away from both of the lines along an imaginary diagonal line extending from the origin of the graph, are likely experiencing a higher amount of bias in officiating than the other teams the referee has officiated. The three graphs below show such plots for each team against each of the three referees. A summary of conclusions is found after the third graph.
Here are some conclusions that can be drawn from the three graphs:
● There is clearly a band around the ± 2% region of the odds differential where most teams cluster.
● It’s also clear that the top teams – Arsenal, Chelsea, Liverpool, and Manchester United – tend to do better on the AH swing than other teams. Indeed, teams that would generally have a more positive Asian Handicap tend to be to the positive on the swing.
● While Arsenal do seem to be experiencing some bias at the hands of the three referees, they don’t seem to be the worst off. Liverpool clearly pays a bigger penalty under these three referees.
● Of Arsenal’s main competition for league trophies the last half decade, Manchester United and Chelsea get a much better shake from the referees. Each has a vastly superior AH swing, while each of the two teams finishes much better in terms of odd differential than Arsenal for two out of the three referees. These advantages translate to an average of a 0.73 goal benefit in AH swing and a 1.6% benefit in odds of winning a match.
● Chelsea’s average odds differential is 1.6% better than Arsenal’s, while their average AH swing advantage is 0.78 goals.
● Manchester United’s average odds differential is 1.5% better than Arsenal’s, while their average AH swing advantage is 0.68 goals.
● Mike Dean is the only referee of the three to put both Chelsea and Manchester United to the positive. He’s also the referee demonstrating the highest bias against Arsenal in AH swing, both nominally and when compared to Chelsea and Manchester United (a 1.52 swing deficit to both).
● Perhaps Ryan Babel’s tweet with Howard Webb wearing a Manchester United jersey wasn’t too far off. It Manchester United is his most favored team by a mile both in terms of swing and odd differential.
● Of any referee who has officiated the Big Six, Mike Dean gives the most favorable treatment to Chelsea. Their average odds of winning a match are improved by nearly 1% when he officiates one of their matches, and they experience their best AH swing under him.
● Of Arsenal’s North London rivals, Spurs are treated about even by Dean and worse by Dowd and Webb.
There are some interesting interactions that go on between referees and managers as well. One could be referred to the Dean-Redknapp effect. When looking at matches where Dean is the official in matches involving Portsmouth and Tottenham, both teams do better in terms of AH swing with Redknapp at the helm than when he is not, bordering on statistically significant effects (odds differential was a wash).
Interestingly enough, the interaction of manager and team is not significant, so Redknapp’s record under Dean is not unduly boosted or penalized by his record from either of the clubs. It stands on its own. Such a detailed analysis would have to be the focus of a subsequent post, and deserves a wider treatment of referees, to ensure that it’s not simply an effect of Redknapp’s superior coaching. Nonetheless, it provides intrigue when studying the effects of officiating.
A Comparison of The Big Six By Season
It has been shown that over the 2006/07 through 2009/10 seasons that Dean, Dowd, and Webb have given Arsenal a bit of the short end of the stick when it comes to fouls-per-booking, AH swing, and odds differential. One last bit of examination remains – what happens if we open up the examination to the entire six years of data in DogFace’s database, isolate for the Big Six for these three referees, and examine how a few of the trends may be changing over time?
The graph below provides a plot of such data. The solid lines represent the teams’ average AH swings under the three referees, the value of which can be found on the right hand side of the graph. The dashed lines represent the team’s average odds differentials under each of the referees, the value of which can be found on the left hand side of the graph.
A few general trends can be observed:
● All teams except Manchester City have been on a general downward trend over time in terms of AH swing.
● Manchester City also is also the only team to demonstrate a steadily increasing odds differential over time.
● Arsenal started out with the third lowest swing in 2005/2006, and their decline has been consistent to the point of falling below 0 this season with the three referees. They are the only team of the Big Six to experience a negative swing.
● The dashed set of lines shows Chelsea and Manchester United hanging around neutral (i.e. 0%) odds differential over time.
● Again, Arsenal is on a steady downward trend to the point that their average odds differential under the three referees is on track to be greater than -2% this season. Only Tottenham has a worse odds differential.
So who’s driving this downward trend for Arsenal? The plot below shows the same data as the plot above, but it eliminates the other three teams and instead focuses on Arsenal’s referees:
On the AH swing front, there’s been a steady erosion the last year. However, before that Mike Dean’s officiating showed a distinct downward path compared to the reed of the referees. His AH swing went negative in the 2009/10 season, and Webb has joined him now in 2010/11. It appears it’s a case of Dean pulling the rest of the average down with him, and the other two referees joining him this season.
As for odds differential, it’s Dean again that leads the pack. He started out as a neutral referee in 2005/2006, but then has steadily eroded that neutrality to a -2% differential by last season. Dean’s low average and the recent degradation in Phil Dowd’s officiating are what are leading to Arsenal’s precipitous drop in 2010/11.
Whether it’s actually poor form generating a higher number of fouls and cards or actual referee bias, it is clear that Arsenal pay a bigger referee penalty than all of the teams they’re competing against for the Premier League championship save for Liverpool. This might be combated by having the three highlighted referees – Dean, Dowd, and Webb – officiate fewer Arsenal matches. They officiated an average of ten matches, or 26% of Arsenal’s season, each of the last four years. Statistics would suggest that having a fewer number of referees officiate a greater number of matches for each squad would lessen the chance of a poorly officiated match impacting a team’s season point total. Such an assumption is based upon the idea that the official’s errors or bias are randomly distributed. The data above suggests otherwise.
Zach Slaton is the author of “A Beautiful Numbers Game” blog where he writes about soccer statistics. He is a supporter of Arsenal and Seattle Sounders FC, and lives in Seattle, Washington. You can follow Zach @the_number_game
Untold Arsenal on Facebook here
History of Arsenal including the series on the failures of Herbert Chapman
Making the Arsenal – the book of Arsenal death and rebirth