By Tony Attwood
I like statistics, and use them a lot. But they have to be statistics that are realistically related to the real world. Consider this for example from Statsbomb
“Arsenal are conceding an eye popping 1.50 expected goals per match. The only five teams with worse defensive numbers than Emery’s team are Brighton, Burnley, Fulham, Huddersfield, and West Ham.”
Now to be clear “Expected goals” answers the question of whether a player should have scored from a certain opportunity. As a measure of how a player is doing it is quite good, because if players don’t score the number of expected goals then there is room for improvement. If they score every expected goal, then there is no room for improvement.
But jump sideways, if you can, and look at the league table organised by goals conceded.
Column 1 in the table below tells us the league position, column 2 the position if we only measured goals against, and column 3 the difference between the two. A blank means that there is no difference between their position on points and their position in term of goals against. A plus number means they are higher in the position on points than they would be if goals against counted, and a negative obviously means they are lower in the position on points than they would be if goals against counted.
If we counted goals against, Arsenal would be three places lower than they actually area, suggesting (to me if no one else) that the defence still needs improving – but that it is not in a catastrophic position.
Lge Pos | Ag pos | Pos dif | Team | P | W | D | L | F | A | GD | Pts |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | Manchester City | 7 | 6 | 1 | 0 | 21 | 3 | 18 | 19 | |
2 | 2 | Liverpool | 7 | 6 | 1 | 0 | 15 | 3 | 12 | 19 | |
3 | 3 | Chelsea | 7 | 5 | 2 | 0 | 15 | 5 | 10 | 17 | |
9 | 4 | +5 | Wolverhampton Wanderers | 7 | 3 | 3 | 1 | 8 | 6 | 2 | 12 |
4 | 5 | -1 | Tottenham Hotspur | 7 | 5 | 0 | 2 | 14 | 7 | 7 | 15 |
6 | 6 | Watford | 7 | 4 | 1 | 2 | 11 | 8 | 3 | 13 | |
13 | 7 | +6 | Crystal Palace | 7 | 2 | 1 | 4 | 5 | 8 | -3 | 7 |
5 | 8 | -3 | Arsenal | 7 | 5 | 0 | 2 | 14 | 9 | 5 | 15 |
8 | 9 | -1 | Leicester City | 7 | 4 | 0 | 3 | 13 | 10 | 3 | 12 |
18 | 10 | +8 | Newcastle United | 7 | 0 | 2 | 5 | 4 | 10 | -6 | 2 |
11 | 11 | Everton | 7 | 2 | 3 | 2 | 11 | 11 | 0 | 9 | |
12 | 12 | Burnley | 7 | 2 | 1 | 4 | 9 | 11 | -2 | 7 | |
16 | 13 | +4 | Southampton | 7 | 1 | 2 | 4 | 6 | 11 | -5 | 5 |
7 | 14 | -7 | AFC Bournemouth | 7 | 4 | 1 | 2 | 12 | 12 | 0 | 13 |
10 | 15 | -5 | Manchester United | 7 | 3 | 1 | 3 | 10 | 12 | -2 | 10 |
14 | 16 | -2 | West Ham United | 7 | 2 | 1 | 4 | 8 | 12 | -4 | 7 |
15 | 17 | -2 | Brighton & Hove | 7 | 1 | 2 | 4 | 8 | 13 | -5 | 5 |
17 | 18 | -1 | Fulham | 7 | 1 | 2 | 4 | 8 | 16 | -8 | 5 |
19 | 19 | Cardiff City | 7 | 0 | 2 | 5 | 4 | 16 | -12 | 2 | |
20 | 20 | Huddersfield Town | 7 | 0 | 2 | 5 | 3 | 16 | -13 | 2 |
Statsbomb conclude “A defense that doesn’t suppress shots, and also allows good shots is not a good defense.” Yes that is true. But not a disastrous defence if the club is winning matches. After all a club that won every game 10-9 would have the worst defence in the league and yet be champions. And although it would give George Graham heart attacks, I’d actually quite enjoy it.
The problem with each statistical measure is that it is only measuring one thing and football is not about one thing. For example Manchester City are not top just because of having the best defence in the league, it is because they have the best defence and best attack.
But there is worse to come in the Statsbomb analysis of because it says, of Arsenal, “They’re the same problems that Arsenal had coming into the season, and ones that new manager Emery was tasked with fixing. He certainly hasn’t been able to yet.”
However try this. Before the season began I put forward the notion that if Arsenal could keep their home form of last season (in which we were second only to Manchester City), but improve our away form back to its average in seasons before the last campaign, we would probably come third.
It is too early to say if we are actually doing that, but the fact is that after 19 away games in the league last season we had won 4. This season after 3 away games we have won 2. We are on track to hit that target of returning to our average away form of previous years. In 2016 when we came third, we won eight of our away games. That seems to me to be a good target to aim for and an analysis which has the virtue of being much simpler than expected goals.
The view is expressed that the defensive inadequacies have been there for all to see for years – all that is except Mr Wenger, and now seemingly Mr Emery have seen it. And yet if we had such an appallingly rubbish defence for so many years and if defence was the key indicator on which success could be measured that should show up in our history.
So I decided to look and see if the number of goals we concede relate to where we end up in the league. If they do they we should have a best defence at the time when we come highest in the league.
Our best defensive season in the last 20 years was 1998/9 with just 17 goals let in, and we came second, so it seems there might be a link. But in 1998 we won the league and yet conceded almost twice as many goals as when we came second.
In 2000 we came second and conceded a whopping 43 goals – our 15th worst defensive performance of the last 20 years. But we came second!
True two of our three worst defensive years in the last 20 seasons have been in the past two seasons when we have ended up 5th and 6th in the league. So that suggests the obvious is true, a bad defence leads to a lower position. But in 2012 we let in 49, our worst but one defensive year in the past 20 years and came 3rd in the league. So it doesn’t always follow.
In 1999/2000 we let in 43, just one fewer than the “disastrous” 2016/17, and yet we came second and no one was screaming that our defence was so awful that we needed wholesale changes.
Two years later we reduced the number of goals conceded by just seven and won the league.
Season | Goals Against | Points | Lge pos |
---|---|---|---|
1997–98 | 33 | 78 | 1 |
1998–99 | 17 | 78 | 2 |
1999–2000 | 43 | 73 | 2 |
2000–01 | 38 | 70 | 2 |
2001–02 | 36 | 87 | 1 |
2002–03 | 42 | 78 | 2 |
2003–04 | 26 | 90 | 1 |
2004–05 | 36 | 83 | 2 |
2005–06 | 31 | 67 | 4 |
2006–07 | 35 | 68 | 4 |
2007–08 | 31 | 83 | 3 |
2008–09 | 37 | 72 | 4 |
2009–10 | 41 | 75 | 3 |
2010–11 | 43 | 68 | 4 |
2011–12 | 49 | 70 | 3 |
2012–13 | 37 | 73 | 4 |
2013–14 | 41 | 79 | 4 |
2014–15 | 36 | 75 | 3 |
2015–16 | 36 | 71 | 2 |
2016–17 | 44 | 75 | 5 |
2017–18 | 51 | 63 | 6 |
Of course it is better to concede fewer rather than more goals, but it is possible to play in a style that commits players to attack, and leaves gaps behind so that the number of goals increases. But it can also lead to the number scored increasing, and indeed so does the entertainment level.
My conclusion is, beware simplistic techniques that just measure one thing, like goals scored or expected goals against. Football is not that simple.
What about expected goals for?
Another mythical stat!
Actual goals (for and against) are all that count.
The developer of xG stats has said that you cannot measure it over one game, or just a few. You can only get a read on the statistics over a great number of games as there are multiple factors that are not accounted for.
However, most people from the media, pundits and fans use the xG stat to prove who should have won a game. It is mereley a statistics that explains the percentage chance a player has to score of a given shot.
It is being used far too much as a relying factor to prove something it was not ment to prove on a game to game basis. The xG stat is a cool, fancy and a new breath in the stat world, which I guess is why people love it so much.
I’m trying out a different Perl module here, I am hoping it does good work.
If 1997-98 is year 0; the average of “year” is 10, with a median of 10, a variance of 36.67 and a standard deviation of 6.06. None of this is interesting or useful.
The mean value of goals against is 37.29, the median is 37, the variance is 54.01 and the standard deviation is 7.35.
The mean value of points is 75.05, the median is 75, the variance is 44.24 and the standard deviation is 6.65.
The mean value of standing is 2.95, the median is 3, and variance is 1.76 and the standard deviation is 1.33.
I calculated all the correlations. In order of strength:
_year vs standing 0.74
_points vs standing -0.66
_year vs Goals Against 0.53
_Goals against vs standing 0.49
_Goals against vs points -0.48
In physics type experiments one normally sees in engineering, correlations of 0.9… are typical. None of these are in that category.
The last 2 years of Wenger’s era seen a drop off, not necessarily Wenger’s fault (or the players). So this could be stronger if we discount the last 2 years. But, there is a reasonably strong correlation between the Wenger year number, and the final standing in the EPL.
Points versus standing shows a negative correlation. That was unexpected. I suspect it comes from at the time Wenger started, there was a Top-4, and now it is a Top-6.
If I was to aim a shotgun at a piece of paper, where the average trajectory of the particles is perpendicular to the paper, I will get a circle of holes in the paper. Correlations between -0.5 and 0.5 are very similar, the paper is (possibly) just slightly tilted with respect to being perpendicular.
I doubts years versus goals against has no value, the others have even less value.
What Tony’s premise was, is looking at correlations against Goals Against. And none of these variables have a strong enough correlation to be useful. That the most useful correlation is Wenger year versus goals against is interesting, but has no value.
I wouldn’t be surprised to find Goals against being more highly correlated against average weight of spectator.
Has it not been said that football match win doesn’t depend on mathematical calculations? That’s why it’s being used in gambling industry. Because no result of a football match is certain until it is played. Unlike in mathematical calculations that’s is certain because it is objective. But football result is subjective because it’s outcome is depended on many known factors that’s planned for and unforeseen factors that is unplanned for which could happen during when a match is being played.
But Nevertheless, Arsenal will continue to win games and games this season in all competitions t clinch Titles because the Gunners are being galvanized by Arsenal admini and technical hierarchies to win more and more games at home and at away in all games this season’s campaign than they were last season when they reached the League Cup final but lost to Man City and reached the Europa League Cup semifinal stage but were knocked out by the eventual winners Atletico de Madrid. And finished 6th in the League table because of their below average performance in their away games which I believe correct steps are being taken to address it to arrest the ugly trend from repeating this season.
Frederick
“The developer of xG stats has said that you cannot measure it over one game, or just a few. You can only get a read on the statistics over a great number of games as there are multiple factors that are not accounted for.”
Exactly. Like Tony I love statistics, but used incorrectly they can be very misleading, as in this case.
As a general rule the more data you use, over the longest possible time, the more accurate the conclusions that can be drawn. Simplistic conclusions from such small data samples are very erroneous.
Gord.
Shirley the negative goals to standing is expected . Higher points equals a Lower standing score .1 being 1st ,no?
Erm I think Tony has misrepresented the Statsbomb column quite badly. Nowhere in that article is it claimed that defence is the only issue. It also goes on to talk about how our attack is not much more than OK.
I don’t think anyone who has ever watched a football match would disagree that if your defence is good and your attack is good, you probably have a good team that will do well. And the stats show that we are not playing well either going forward or at the back. That’s it. And i would agree with that assessment of Arsenal right now. We look disjointed and problematic. None of that is due to any media agenda or bias. We all hope and expect that things will pick up as the side learns the new systems. The fact that we have won 5 on the trot is a very encouraging sign that we are winning despite not playing very well. So whether you see it as good news or bad news is really up to you. But xG also implies a prediction: if we continue to produce numbers like that, it is very likely that we won’t do as well as we want to. That’s it. Nothing sinister.
Also nobody who uses stats would pretend that they are more important than the only stat that matters – the result. What stats do is they offer a window on the performance that the result alone will not tell you. That GA column in the table? It doesn’t change the points column. It just gives you an idea of how the team managed to rack up those points, of how strong they were defensively. Stats like xG go one level deeper than that, and try to explain why and how that GA number is as high or low as it is.
Another conclusion you could draw from these stats is that if Arsenal can keep winning playing this shite, just imagine what they’ll do once they sell every player and start playing well. They should piss the league. So it depends on what aspect of the stats you want use to predict the future. With Arsenal and the media of corse it’s always Armageddon.
You’re correct Ferg.
SAA, mathematics is a way of describing the universe. You may not use it, that doesn’t mean it has no value.
Ferg, I think the technical term for my mistake is called a “brainfart”. 🙂
Oh, for those interested, the really simple code for this is at the Untold Arsenal github site.