Dear ESPN: Expected Goals do not account for refereeing errors


By Sifarzone

Expected Goals are all the rage today and it is certainly a great addition to the toolbox we use in analyzing and understanding football. It gives an understanding to the match beyond just the ultimate results, goals. It values long term-performance over short-term results, since long-term performance will eventually return as results.

It is also something that we do this all the time in real life using more simplistic statistics, like averaging numbers. Having said that, there is a right and wrong way in using Expected Goals as a tool.

Briefly speaking, here is how I understand Expected Goals to work. I apologize if I am rehashing something that the readers already understand, but it is important to my thought process to write these things down. Since I am of average intellect and capability here is how I see Expected Goals.

A – stuff/shots team A does.

B – stuff/shots team B does.

C – stuff the referee does (refereeing decisions influence games).

D – the fluky stuff/shots (bad bounces, 40-yard screamers, deflected goals).

E – Goals.

By looking at the above variables, we can easily see the relationship in every match is:

A + B + C + D = Goals!

What Expected Goals endeavour to do is to take out some of the D, the fluky stuff. The reasoning behind this is that fluky stuff is not sustainable in the long term, and therefore they are assigned a lower value accordingly. That 40-yard screamer that was scored – it is revalued at 0.02 (probably even less) instead of the 1 goal it ended up being recorded as in the results.

So in the long term, all things being equal, we cannot reliably take 40 yard shots and expect to affect the score. Perfectly encapsulating that long-term performance is more important. It basically tells the audience and managers the quality of the shots.

Now assuming that Expected Goals encapsulates all the fluky stuff for shots (it doesn’t, but in terms of shots, we can assume as such), the equation becomes:

A + B + C = Expected Goals.

What Expected Goals does not do, is account for refereeing errors. We know this well because if Expected Goals is a judge of performance and the game can be affected by the referee, it stands to reason that Expected Goals are affected by refereeing decisions. For example if referees give loads of freekicks from good positions and they are all directly taken as shots, the Expected Goal value will be “expected goal value from that position” x loads. So referees definitely affect Expected Goals.

The next step, as with every statistic, is how you implement it into your understanding of the situation. I have previously written on Untold about how paid journalist should do a better job on using statistics. In the previous article, which in hindsight, I poorly titled “Intentional misrepresentation, misunderstanding of statistics, or…?” I tried to impress upon the fact that I was not judging the player himself (Lacazette), rather my criticisms are of the journalist who used statistics that were neither convincing nor particularly rigorous.

In the article titled “Arsene Wenger deflects as Manchester City roll on” an ESPN writer slags off Wenger for criticizing the referee and using it as a deflection tactic. In particular he uses Expected Goals as the justification that Manchester City were a better team despite the refereeing errors.

One irksome sentence was, “Or perhaps he wanted us to ignore the fact that, despite flurries of better play from Arsenal, this was still a hugely one-sided game, as the 1.83 to 0.31 expected goals count shows.”  I would implore the reader to read his article to make sure I am not misrepresenting him. Once again, I emphasize that I am not here to say whether his criticism of Wenger is right or wrong, rather the statistics used are not appropriate.

In this particular game, there was an offside goal, and according to many, a questionable penalty.  Both are directly influenced by refereeing decisions, whether they are right or wrong. What the writer is doing in effect is ignoring refereeing decisions. Remember that offside goal? From that position, a shot carries an Expected Goal value of about 0.7 or 0.8, and that penalty, also carries an Expected Goal value of about 0.3 or 0.4 (assuming it counted as a big chance rather than penalty, if counted as penalty, this value is almost doubled).

So subtracting 1.8 from 0.8 and 0.4, we get 0.6.  Thus in terms of Expected Goals, we get 0.6 vs 0.3, Man City vs Arsenal. Based on the performance of both teams on the day, this is the value of goals I would expect basically, anywhere between a 1-0 or 0-0; flip of the coin stuff.

Lastly, I would point out that even without the Expected Goals statistic, we know one thing from experience; Goals Change games!!!! A bad refereeing decision that results in a goal can change the performance and therefore the Expected Goals of a team.

Statsbomb has a particularly good article on this. So the ESPN writer not only misuses the Expected Goals number, he also disregards that erroneous goals can change games. Non-disclaimer: I do not work for Statsbomb I just really enjoy the articles since they are discussed statistically well, even though I do occasionally disagree with their interpretations.

In conclusion, we cannot determine the game performance of teams without considering refereeing influences with Expected Goals, so maybe there can be a statistic for that in the future. If we use Expected Goals and expect referees to have no part in it, it is like saying that the food tastes bad but the ingredients do not matter.

In short, paid journalists should write better articles. And I apologize if I showed ignorance in any of the analysis; please let me know if there are any statistical things I misunderstood, I am amateur and not paid so at least I have that excuse.

Good supporting Arsenal!

Recent Posts



13 Replies to “Dear ESPN: Expected Goals do not account for refereeing errors”

  1. Over my head I’m afraid 🙂 I do believe statistics are very important indicators of trends and may be used for countless data processing operations to prove or disprove information.
    To be frank, following the City game, I found consolation only by watching Adrian Clarke’s analysis of the game on
    He seems to have watched the same game as I did 🙂

  2. If one accepts the ESPN angle i.e. Man City were far better because of the Expected Goals (irrespective of bad decisions), why is it that many writers call us lucky if we win with an Expected Goals ratio of 3.0 to 0.2 and suggest that if it was only 1-0, the losing side were unfortunate and “deserved something from the game”. Yet again, as with refereeing, it’s the inconsistency and hypocrisy I can’t stand.

  3. I am going to guess that what an expected goal is, is that someone has catalogued every shot taken and noted whether a goal was scored or not. This might be specific to a league, I’ve no idea. They pretty much have to “bin” the data, divide the field into boxes. A shot made from anywhere in that box is counted as being from one “spot”.

    One thing that will happen, is that as these bins get further from the goal, that the number of shots towards the net will go down. While someone can still assign a value to this “long shot”, the value will become increasingly fuzzy as the distance increases.

    Are all “long shots” unlikely? Maybe someone should ask the Arsenal Ladies team from their last game against Reading? As I read it, Fara Williams noticed our goaltender was not paying enough attention when Reading kicked off after conceding the goal Arsenal had just scored. She shot on net from the center circle and scored, which was the winning goal for Reading.

    Lets pick a bin from closer in, one where there are lots of shots and lots of goals, so the expectation can be well calculated. Are all those shots equal? Is a goal from a direct free kick at that spot the same as a goal from open play? Does it matter if the shooter is using their “strong” foot? Does it matter if the shooter is moving slowly, moving a medium speed or running at full speed? Does it matter which direction the shooter is running? Did the shooter have control of the ball, or did they shoot on the volley?

    How the bins are set up could influence how meaningful the data from a bin is. There is a lot of prejudice to use a Cartesian grid system in binning (all bins are square of the same size). Is the edge of a bin aligned with a goal post, or does the goal post fall midway across the bin?

    What makes more sense to me, is to use bins based on a range of angle and radius. For long shots, this concept is easy to understand, but for close shots it is hard to define. Radius to the near post, the far post, the centre of the goal or where the goalkeeper is positioned? Or worse, where the goalkeeper is supposed to be positioned?

    So not only does this concept not allow for errors in officiating, there are a host of analytical problems with it. And there is of course the statistical problem; they are giving the estimated expectation but they are not telling you the precision of that estimate.

    But, I had never seen or heard anyone bring up this concept before. It may be that it is better defined than I think it is. If it is limited to ESPN (world wide leader in being ESPN), that would explain why I haven’t heard of it.

  4. @Gord: I will like to make contact and bounce some statistical/data science issue off you. How do I reach you?

  5. Do you know github? UntoldArsenal has a project there. Mostly a few programs I wrote in perl. In the source code, is an email for me.

    It has been 7 months since I contributed anything, but that is the way summer is.

    If you want to contribute, that would be wonderful.

  6. Sorry, but I must be getting thick in my old age. Could not understand the article, I thought we lost because we did not score more goals than City, oh yes and a shocking referee.

  7. It seems hard to find this project. the only hit I get from Google, is that OlegYch is following UntoldArsenal. I recognize OlegYch from comments here.

    So, here is a link to the project. One repository (ua1), and the code is written in Perl.

  8. NuttnTiddy

    I think what ESPN is doing, is at least in part trying to justify why one team wins or why one team loses.

    Chance (random factors) is a big part of all football games, but people expect that not all teams have the same quality. That if two teams were to play each other enough times, that the better team would win more games. But, to say that Arsenal has played ManU N times and won however many times has a problem in that the teams are not constant. Arsenal does not have the players playing now, that were on the team 100 years ago.

    This expected goal thing is trying to give the story teller some new things to babble about. Because it is obvious to anyone that it is just as easy or difficult for Alexis to score from the corner of the 18 yard box, as it is for Xhaka or Ramsey. 🙂

  9. Hi All,

    Sorry I have been on the road and have taken me some time to reply. Yeah the link that Goonermikey provided has a lot of information concerning how the shots are “binned” (as Gord refers) and also what the Expected Goals values are.

    In terms of long distance shots of the Arsenal Ladies that Gord refers to, keep in mind the Expected Goals number is the average for all league players, hence anyone who is better at finishing from a particular situation will outshoot their Expected Goals value. Keep in mind also that undershooting your Expected Goals value does not necessarily make you a bad finisher, getting into a good enough position to take a shot high Expected Goals shot is a skill in itself, and there are various other nuances to watch out for.

  10. Thanks for the followup Sifarzone. I thought it was a useful article, but I wouldn’t personally buy into much validity of the numbers.

    Having a goaltender not paying attention, is always a good reason to have a crack from a long distance.

    I see FiveThirtyEight has some blurb related to this. But looking at ESPN or this place, isn’t seeing data. I would guess it is paywalled.

    I see a reference at MLSsoccer to a blog by Opta on expected goals.

  11. Gord,

    Yeah, statistical analysis is always tricky, but put in the proper context, more information is better than less.

Comments are closed.