Why all the football statistics now available still leave us uncertain as to what is going on.

Posted on 12 November 2017 by Tony Attwood

by Tony Attwood

Some time back, I criticised a book called The Numbers Game – which purported to reduce all football to a valid statistical analysis – and I was reminded of it again when another book appeared doing the same sort of thing: claiming it had analysed every aspect of the game and was able to set out in statistics why some teams won lots of games and others teams didn’t.

Both books had the same problem – they assumed that refereeing was both accurate and unbiased. Assumed it so much that at least in The Numbers Game, and I am told also in subsequent volumes that have followed this approach, there is not even an acknowledgement that there is this assumption made. It is made as an absolute and inviolable truth.

Now obviously for anyone who wants to put forward an analysis of data, there are assumptions to be made. For example, we assume that the laws of gravity are constant and identical across all football pitches. We don’t go out and test this, because basically there is no need. There is no evidence to suggest anything other than the fact that such gravitational anomalies that do exist are so small that their influence on the flight of the ball are negligible. A ball whacked up high in the air accelerates back to the ground at a set rate of 32 feet per second, per second in a stadium built and paid for by tax payers as it will be in a stadium built and paid for by a football club.

But when it comes to people, matters are not so easily resolved.

Yes they can collect data and generate things called a “pressing index”, “room control” and “pass effectiveness” and from this they can make all sorts of predictions about how good a club is and what it ought to get its players to do, for the club to be better.

And this data can be used to create ball possession statistics, pass rates, running distance or heat maps – but they don’t take into account external variables, such as the size of the pitch. However that can readily be factored in of course, because it doesn’t change. (It used to change – I remember the time when Crystal Palace reduced the width of their pitch in an attempt to avoid relegation – but that is not allowed now, so again the stats can be valid).

But referee variance is hard to factor in. I thought of this particularly today when listening to a BBC Radio 5 discussion about the recent controversial referee decision in a world cup qualifier. It was, as most radio and TV discussions, hampered by a complete lack of knowledge on the part of the pundit and the programme host about video evidence, bemoaning the fact that it had not been introduced, while utterly unaware that it did exist in some countries. How they known that parts of Europe have it, they could then have asked the more interesting question: why does it not exist in this international match?

But they didn’t and such ignorance is a major hindrance to good debate. And that was a shame because that debate, properly handled by a knowledgeable presenter and a knowledgeable pundit, could have asked, “how often do referee errors of this nature happen?” And “do they happen to one country more than another?” And “why, when the technology is available, is it not being used in this match?”

Statisticians do realise that statistics are always open to question, of course. One example that is often quoted is the Champions League game Celtic Glasgow against FC Barcelona on 7 November 2012 . Here’s one commentary: “Barcelona dominated the game seemingly as they liked: 84 % possession, 25: 5 shots on goal, 955 passes and an outstanding pass rate of 91 %. Celtic made only 166 passes and delivered 38 percent bad passes . But the Scots won 2-1 and the first goal from Barcelona fell only in stoppage time. It was similar at the World Cup semi-final in Brazil 2014: The hosts had the better pass rate, more shots and were more often in the attack third. But Germany won 7-1.”

Such realisations lead to the introduction of more factors in the statistics. As Daniel Memmert of the Institute of Training Science and Sports Informatics at the German Sport University Cologne said recently, possession “says quite a lot about the dominance of a team, but rarely anything about their actual chances of victory.”

So David Memmert and the sports informatics specialist Jürgen Perl of the University of Mainz have gone further and set aside what is now called “naïve data analysis” to consider what are now thought to be “tactically relevant aspects, known as key performance indicators to determine more about the qualities of a team.”

Thus we now have, for example, the Pressingindex which shows how fast a team attacks the opponent after losing the ball. And “room control” which describes the dominance of a team over the spaces on the football field and is defined “by the area of the grass that players of a team can reach in front of the players of the opposing team.”

Now what is interesting is that the analysis, funded by the German league, analysed 50 games in the Bundesliga season 2014/15 and saw how space control, pressing index and pass effectiveness parameters related to the results.

But still the point remains this is still “naïve data analysis”, because if (and it is only “if”) the referee is either not as competent as PGMO (for example) claims, or makes multiple mistakes which don’t “all even out in the end” then there is a broader overarching problem, which means that the data analysis still doesn’t really tell us what is going on.

It is interesting that our analysis (160 games analysed) which came complete with video evidence and which is available in detail for everyone to see on this site, didn’t just do 50 games, but did over three times as many games. The conclusion was clear – referees make far more mistakes than is commonly realised, and they don’t “all even out in the end”.

What is just as interesting is that ever since the report was released two things have happened. First the broader football world has ignored it. Second, on this site people have tried to discredit it through pushing forwards three viewpoints over and over and over again. (I don’t publish them all, but enough for anyone interested to be able to have a read through.

Those three viewpoints are:

a) that the 160 games survey was carried out by Arsenal supporters. Although that is true, the existence of all the video evidence allows anyone to look for bias. And it should be remembered that the conclusions reached were the same as those on the Referee Decisions website in which the analyses were undertaken by referees whose allegiance was with other clubs.

b) that we are pushing the viewpoint that referees are bent. In fact what we are saying (so often that it is getting quite boring) is that we can’t tell if referees are incompetent or bent, but that the way the PGMO is set up as a secret society makes it impossible to anyone to know what is going on. Quite why PGMO has set itself up like this, and has also restricted the number of referees in a way that other leagues have not, we don’t know. It would be nice if they would tell us, and would open up a bit, but their secrecy and decision making is suggestive of the notion that something is wrong.

c) that we have not provided any evidence. This attempt to discredit us is getting very tedious, especially when we have produced so much evidence. A recent example was the demand from a reader for evidence to show that the other main leagues in Europe have more registered referees than the PL does. It’s an important point because if there were to be anything wrong with a referee, the fact that in the PL he could referee one club multiple times in a season would give him more chance to fix matches.

In fact we have published the data analysing the major leagues of Europe in this way on this site. But by endlessly writing in comments demanding to see this data, those who deny the existence of a problem are attempting in a simplistic way to force us to spend hours going back and repeating over and over the data, rather than get on with new work.

There used to be a number of people who suggested that what was wrong with Mr Wenger as a manager was that he never changed his style or playing. This was often accompanied by a fake quote assigned to Albert Einstein saying that doing something that failed over and over again in the hope that it would give a different outcome next time was a true sign of madness.

In fact doing the same thing over and over again – suggesting that we haven’t produced statistics and analysis when we have, and ignoring each new set of data that we produce – is what those opposed to our approach do, but it is not a sign of madness. Rather it is a sign of desperation. Millions of pounds are invested each year into analyses of football matches, and always on the assumption that referees are both capable of making the correct decisions almost all of the time, and not biased for or against one team at any time.

The fact is that if, as our figures show, the error rate within refereeing decisions is very high, then it becomes harder to see if there is bias. If the average referee makes two errors in a game, and both are against one team, that is easy to spot. If the average referee makes 22 errors in a game, and a couple of important ones are made against one team while the rest even out, that is much harder to spot. One is looking for two “deliberate” errors among 22 random errors. That is tough, and one can understand why the books that analyse football matches shy away from the task.

If there are 50 errors a game that is harder still to analyse out.

But without an analysis of referee errors alongside all the analysis of player activity, ultimately all the statistics leave us uncertain as to what is really going on.

22 Replies to “Why all the football statistics now available still leave us uncertain as to what is going on.”

Gord says:

12 November 2017 at 2:09 PM

I think the number of errors per game is significantly higher than 50. It is not unusual to see some photograph from a game, and be able to see illegal contact between players. Think how many more you pick up analysing video “frame by frame”.
Gord says:

12 November 2017 at 2:20 PM

Giroud is on his way home from French team – thigh injury.

No score yet in the Ladies game.
Andrew Crawshaw says:

12 November 2017 at 2:56 PM

Halftime at Borehamwood. Sunderland Bus well and truly on the pitch. Every time we go forward they have ten behind the ball. So far our final ball not good or quick enough. All of the chances have gone to Arsenal but no end product yet.
Gord says:

12 November 2017 at 3:39 PM

I’ve refreshed the Ladies twit feed 3 times in the second half, and had goals every time on the latest twit. I think Andrew is getting to see a good game.
Gord says:

12 November 2017 at 3:52 PM

I see Eddie Nketiah has bagged another goal, the winner against Iceland.
Andrew Crawshaw says:

12 November 2017 at 4:25 PM

Gord

Better second half but we still aren’t at our fluent best. Three second half goals to give us a three nil win. Better but still a way to go! Quinn with a header from a Nobbs free kick, Miedena with an exquisite finish from the edge of the area and Nobbs with the third from a free kick on the edge of the area. All top class goals. Great save at the end by Moorhouse fromSimona Koren to keep her clean sheet.
Andrew Crawshaw says:

12 November 2017 at 5:00 PM

We are now up to third in the WSL. Three points behind Chelsea (who were held 2-2 by Reading) and six behind Man City who have already beaten us once this season. Significantly we have a much worse goal difference than both teams above us. We really need to start playing at our best soon otherwise it will be yet another year without European football for us.

Let’s hope that our new manager, when he arrives from Australia, can quickly get us performing to our potential.
Gord says:

12 November 2017 at 5:35 PM

Hmm, only one post from Tony today. Has he perhaps been busy participating in an auction in/near Dallas, Texas? Would you pay $400k for a guitar?

🙂
Samuel Akinsola Adebosin says:

12 November 2017 at 6:43 PM

Statistics pointing to us all that has taken place in a football match after it has been played is good for the record keeping of all the events in the match which have happened between the two opposing teams that played against each other on the field of play. But the match officiating officials being a big part and factor in the outcome of the match, the match referring statistical analysis is equally as important as the match statistics of the two opposing team that played against each other. And therefore, this cannot not be ignored when compiling the statistical evidence of a football match but must be factored in to have a comprehensive statistics of the match and not have it one sided as usually the case when it has looked the match statistics is concentrated on the teams playing stats but leave out the match referring stats as if it is not relevant when statisticians compiled the match statistics, whereas it is very relevant.

Let me say this in advance that whatever be the match statistics that will emanate and emerge from the PL NLD match between Arsenal and Spurs on Saturday at the Arsenal Stadium, Arsenal will make their first team game playing quality tell excruciatingly on Tottenham Hotspurs in the game and the Gunners will extend their home winning run in the PL to six games by beating Spurs by 3 goals to nil to rekindle their PL Title winning charge this season back on tract.

In fact, let Tottenham Hotspur have the match statistics and match referring statistics that is never made available to themselves and keep them. But Arsenal will have the all important 3-0 match winning goals and keep them to themselves.
Gord says:

12 November 2017 at 7:49 PM

Switzerland with the big 0-0 over Northern Ireland to go through. Xhaka played the entire game. I don’t see any mention of an injury.
Brickfields Gunners says:

13 November 2017 at 5:08 AM

An interesting study –

Alignment Of 6,071 Completely Independent Variables Necessary For Man To Feel Okay.

https://local.theonion.com/alignment-of-6-071-completely-independent-variables-nec-1819578

I’m sure that for Arsenal fans( other than the AKBs of course !)to feel okay , there would be a quite a few more variables needed ! Including the following-

Getting rid of Silent Stan.
Usmanov taking total charge.
Usmanov spending lots of fucking money.
Sacking the entire present board.
Bringing in Piers Morgan.
Appoint minority shareholders to the board.
Appoint old legends to the board.
Bringing in David Dein.
Make Dein Jr. the only players agent.
Sacking AW .
Appointing new super duper manager.
Getting rid of deadwood players and those nicking a living.
Buying all the best players that Usmanov’s money can buy.
Rendering AKBs Persona Non Grata .
Banning Untold Arsenal.
Bringing down season tickets prices.
Getting rid of or banning ‘plastic’ fans.

Did I miss out anything ?
colario says:

13 November 2017 at 5:43 AM

Did you miss anything?
Word on the block is that there is a 3 year delay coming for the building of the new chelski stadium.

The present estimated cost of the stadium is 500 million pounds.

Aberamavitch has spent one billion pounds on chelski but is not going cough up the 500 million. The money will have to be raised from a syndicate in the far east.

All aint so well in the oil well department
Brickfields Gunners says:

13 November 2017 at 6:00 AM

Not to worry , Charles , they’ll probably be able to raise that money by selling some of their loan players . And some deadwood too!
And the Costa sale in January .
Brickfields Gunners says:

13 November 2017 at 6:18 AM

Butch the Rooster-

Sarah was in the fertilized egg business. She had several hundred young pullets and ten roosters to fertilize the eggs. She kept records and any rooster not performing went into the soup pot and was replaced.

This took a lot of time, so she bought some tiny bells and attached them to her roosters. Each bell had a different tone, so she could tell from a distance which rooster was performing. Now, she could sit on the porch and fill out an efficiency report just by listening to the bells.

Sarah’s favourite rooster, old Butch, was a fine specimen but, this morning, she noticed old Butch’s bell hadn’t rung at all! When she went to investigate, she saw the other roosters were busy chasing pullets, bells-a-ringing, but the pullets hearing the roosters coming, would run for cover.

To Sarah’s amazement, old Butch had his bell in his beak, so it couldn’t ring. He’d sneak up on a pullet, do his job, and walk on to the next one.

Sarah was so proud of old Butch, she entered him in a show and he became an overnight sensation among the judges.

The result was the judges not only awarded old Butch the “No Bell Piece Prize”, they also awarded him the “Pulletsurprise” as well.

Clearly old Butch was a politician in the making.. Who else but a politician could figure out how to win two of the most coveted awards on the planet by being the best at sneaking up on the unsuspecting populace and screwing them when they weren’t paying attention?

Vote carefully in the next election: You can’t always hear the bells.
colario says:

13 November 2017 at 7:34 AM

Dr Brickfields Where did you get your no bell story from? 🙂
Brickfields Gunners says:

13 November 2017 at 7:53 AM

Someone sent it to me on Whatsaap. Origins unknown !
Gord says:

13 November 2017 at 1:00 PM

Pullet Surprise

I see a copy on backyard chickens from 2008. And then a a different thing in Dec 2007.

I have no idea how old this joke is. Does it pre-date the Foghorn Leghorn cartoon?

Apparently this kind of working is called a malapropism.
http://articles.latimes.com/1985-02-27/news/vw-8930_1_pullet-surprise
finsbury says:

13 November 2017 at 2:59 PM

“The money for the new stadium/laundrette will have to be raised by a betting syndicate / laundrette from Singapore?”

Did I read that correctly? 😉
finsbury says:

13 November 2017 at 4:10 PM

Fortunately her majesty’s constabulary are not as big a SHAMBLES™ as the FA committee that employs Trevor Sinclair, and therefore they didn’t collapse into a gibbering heap with the emission of large amounts of methane when identifying where simple rules were broken.
Gord says:

13 November 2017 at 4:39 PM

I was looking for news about Eddie Nketiah this morning.

Tony seems to like sniping at Amy Lawrence and The Guardian, I guess they leave themselves open to it.

They put out an article titled, “Eddie Nketiah: the young Arsenal ‘goalscoring machine’ who Chelsea let go” on October 25, 2017.

https://www.theguardian.com/football/2017/oct/25/eddie-nketiah-arsenal-goalscoring-machine-chelsea

About midway through the article is a paragraph which ends:

… than the 1.75m (5ft 7in)Nketiah.

Lack of attention to details here. 1.75m is essentially 5’9″ – 5’7″ is 1.7m.

The article mentions many times that Eddie is smaller than most players. Well, the average height for men is about 5’9″ (1.75m), so for Eddie to be smaller than most likely means he is 1.7m.

But, you cannot publish an article about Arsenal with bashing the team. So, it isn’t surprising to see the article end with this heading:

Fans’ group oppose Keswick and Kroenke Jr re-election

This heading and the 3 paragraphs under it, have nothing to do with the article. They are just there to bash Arsenal.

Please Ms. Lawrence, go take a long walk on a short pier. Or a short Piers, as you see fit.

Dorks!
Brickfields Gunners says:

13 November 2017 at 5:51 PM

Thanks for the pullet surprise link , Ford , it was truly and really funny .
Must try and come up with new names and terms for those jerks and vermin who are infesting UA.
Brickfields Gunners says:

13 November 2017 at 5:52 PM

Sorry…Gord….

Why all the football statistics now available still leave us uncertain as to what is going on.

Recent Posts

Recent Posts

22 Replies to “Why all the football statistics now available still leave us uncertain as to what is going on.”

Leave a Reply