**The PGMO v Arsenal – the Half Time Report**

by Andrew Crawshaw

This is a review of the first 19 weeks of the season, who were the referees, were they Good, Bad or Downright Ugly and any other points I consider relevant. I am using Walter’s referee reviews as my source document and will post links to then so that you can check further points yourselves.

**Matchweek 1 – 16 August Arsenal v Crystal Palace** – a game we won 2-1

- Referee – Jonathan Moss
- Assistants – L Betts and M Scholes
- Fourth Official R Madley

**Overall weighted referee score 81%; Bias against Arsenal 100%; Wrong Important Decisions** :-

**Second Yellow Card** – 1 (Min 64 Chamakh)

**Red Card** – None

**Penalties** – 2 (Min 8 Push on Ramsey, Min 72 Hangerland on Giroud)

**Goals** – None

Season opener, Palace player not sent off and two penalties to Arsenal not given and every wrong decision against Arsenal, referee performance typical of the PGMO. Overall rating of 81% would be good but for the other numbers so downgraded to Adequate

Watch Arsenal Live Streams With StreamFootball.tv

The Untold Referee review: Arsenal – Crystal Palace |

**Matchweek 2 – 23 August Everton v Arsenal** – draw 2 all

- Referee – Kevin Friend
- Assistants – A Garratt and M McDonough
- Fourth Official – M Oliver

**Overall weighted referee score 60%; Bias against Arsenal 89%; Wrong Important Decisions** :-

**Second Yellow Card** – None

**Red Card** – 1 (Min53 Wilshere wrongly only given yellow card

**Penalties** – None

**Goals** – 1 – (Min 44 Naysmith was offside when he scored “goal”)

Two wrong decisions an incorrect goal for Everton and Wilshere should have been sent off. Mr Garratt was responsible for not spotting the offside goal and then gave an imaginary foul to break up a promising Arsenal attack. Refereeing performance bad

Untold Ref Review: Everton/Arsenal. What is an advantage, and what’s wrong with assistant Garratt? |

**Matchweek 3 – 31 August Leicester v Arsenal** – draw one all

- Referee – Anthony Taylor
- Assistants – G Beswick and D Cann
- Fourth Official – R East

**Overall weighted referee score 61%; Bias against Arsenal 93%; Wrong Important Decisions** :-

**Second Yellow Card** – 1 (Min 58 Moore)

**Red Card** – 1 (Min 84 Hammond

**Penalties** – 1 (Min 50 Hammond, push on Cazorla)

**Goals** – None

Bad refereeing, awful bias – not given penalty and Leicester should have ended up with 9 men on the pitch. Referee decided the result of the game and (almost certainly) cost Arsenal a win.

Ref review Leicester – Arsenal |

**Matchweek 4 – 13 September Arsenal v Man City** – draw two all

- Referee – Mark Clattenberg
- Assistants – S Beck and J Collin
- Fourth Official – L Moss

**Overall weighted referee score 71%; Bias against Arsenal 73%; Wrong Important Decisions** :-

**Second Yellow Card** – 1 (Min 45 Milner)

**Red Card** – None

**Penalties** – 21 (Min 76 Wilshere controlled the ball with his arm)

**Goals** – None

Not great refereeing, but not the worst, bias numbers nowhere near balanced but are in fact the least crooked we have seen in the first half of the season – 71% gets barely adequate

Ref Review : Arsenal – Man City |

**Matchweek 5 – 20 September Aston Villa v Arsenal** – Arsenal win 3-0

- Referee – Mike Jones
- Assistants – M Scholes and I Hussin
- Fourth Official – K Friend

**Overall weighted referee score 82%; Bias against Arsenal 80%; Wrong Important Decisions** :-

**Second Yellow Card** – none

**Red Card** – 1 (Min 83 Clark on Podolski)

**Penalties** – None

**Goals – None**

Despite the wrong decision bias, the best refereeing performance of the first half, only one wrong Important decision which didn’t impact on the result, still a missed red card though.

Ref Review Aston Villa – Arsenal |

**Matchweek 6 – 27 September Arsenal v Spurs** – Draw one all

- Referee – Michael Oliver
- Assistants – S Burt and J Brooks
- Fourth Official – C Pawson

**Overall weighted referee score 66%; Bias against Arsenal 85%; Wrong Important Decisions** :-

**Second Yellow Card** – None

**Red Card** – 1 (Min 72 Mason on Özil)

**Penalties** – 1 (Min 30 Rose on Wilshere)

**Goals** – None

Another bad refereeing performance, two wrong Important Decisions (almost certainly) costing Arsenal two points.

REFEREE REVIEW: Arsenal – Tottenham |

**Matchweek 7 – Chelsea v Arsenal 5 October** – Loss two nil

- Referee – Martin Atkinson
- Assistants – M Mullarkey and S Child
- Fourth Official – J Moss

**Overall weighted referee score 44.7%; Bias against Arsenal 86%; Wrong Important Decisions** :-

**Second Yellow Card** – 3 (All for Oscar, take your pick Min 40, Min45+2 and Min74)

**Red Card** – 4 (Min 20 for Cahill on Alexis, Min 20 for Ivanovic elbowing Özil off the ball, Min43 Alexis for punch on Ivanovic, Min 90 Alexis on Fabregas)

**Penalties** – 1 (Min 62 Fabregas handball)

**Goals** – None

Atkinson in charge of a Chelsea v Arsenal game – worst refereeing in the first half of the season 44.7% overall. Eight wrong Important decisions, the first two in the 20th minute with the score at nil all two Cheslea players should have been sent off for separate incidents, Cahill trying to break Alexis’ leg and Ivanovic being a thug and elbowing Özil in an off the ball incident. Downright Ugly performance and exactly what you would expect from this referee in this game. I can’t imagine Arsenal losing had the two Chelsea players been correctly sent off in Minute 20.

Untold’s Unacceptable Referee Review: Chelsea – Arsenal |

**Matchweek 8 – 18 October Arsenal v Hull** – draw two all

- Referee – Roger East
- Assistants – S Beck and A Nunn
- Fourth Official – C Foy

**Overall weighted referee score 62%; Bias against Arsenal 82%; Wrong Important Decisions** :-

**Second Yellow Card** – None

**Red Card** – None

**Penalties** – None

**Goals** – 1 (Min17 Diame “goal” should have been cancelled for clear foul on Flamini)

Bad refereeing performance, very poor overall numbers and Goal awarded incorrectly to Hull. Wrong Important Decision cost Arsenal two points.

Referee review : Arsenal – Hull : East, West not the best |

**Matchweek 9 – 25 October Sunderland v Arsenal** – Arsenal win two nil

- Referee – Kevin Friend
- Assistants – J Collin and C Hatzidakis
- Fourth Official – L Mason

**Overall weighted referee score 65%; Bias against Arsenal 75%; Wrong Important Decisions** :-

**Second Yellow Card** – None

**Red Card** – 1 (Min 19 Rodwell on Chambers)

**Penalties** – None

**Goals** – None

Another bad refereeing performance with very low overall numbers, one wrong Important Decision (red card in Minute 19) didn’t change the result of the game but made it far harder than it should have been.

Untold Referee Review: Sunderland – Arsenal |

The series continues…

Here are the rules if you want to comment

**Classic Untold**

I cant wait to see our friend Tom’s reaction to that little lot.

Thanks Andrew. You have a typo in Matchweek 4. There weren’t 21 wrong penalties. 🙂

Gord

On the other hand 😉

The weighted scores over those first 9 weeks are:

81, 60, 61, 71, 82, 66, 45, 62, 65

If I sort them, and then arrange them into a U

45 60 61 62

65

66 71 81 82

Which shows that the median weighted score is 65%.

If I find the absolute difference between any particular score and 65, I get:

20 5 4 3 0 1 6 16 17. Sorting that lot and making the U again:

0 1 3 4

5

6 16 17 20

Which shows a measure of the width of that distribution of weighted referee scores is 5%. From other work in this lately, this measure is significantly smaller than a standard deviation is for a Gaussian. So, the central tendency is for referees to score somewhere in the 55 to 75% range, and if they are far away from either end, there is likely to be something unusual happening.

Hence weeks 1(Crystal Palace), 5(Aston Villa) and 7(Chelsea) might be unusual.

I am just going through motions, and raising flags.

Gord

And confusing the fuck out of me !!

Mr Thurston always said I wasn’t cut out to be a Mathematician. I guess he was right 🙂

Jambug.

R.W.Hamming, in the book “Numerical Methods for Scientists and Engineers” said the object of computing is insight, not numbers. I will the same is true here.

People calculate averages, variances and standard deviations, because that is what other people do. Then I come along and spend most of my time sorting numbers.

The average is sensitive to outliers in the data, whereas the median is a more robust way to estimate the center of the data. The variance (and hence standard deviation) is even more sensitive to outliers than the average is. Which is why I calculated the median absolute deviation (MAD).

With that in hand, what did I write?

Gord

I’m sure it makes sense to most people, alas you may as well be talking Chinese as far as I’m concerned. Sorry.

No, most people don’t worry about robust statistics, I suspect even most engineers.

When we calculate an average, we are saying that we think a single population is present (just apples, not a mixture of apples and some other fruit, or multiple kinds of fruit). In addition, we are assuming one of two things:

1. We have so much data, the presence of any outlier or two will not effect our calculations

2. We don’t have outliers in our data

With the data we are playing with courtesy of Andrew and Walter, we don’t have a lot of data. And we have no idea if there are outliers. We are hoping there is only a single population, and we need some way to detect whether we have more than one population (how can we flag outliers).

By using the median, I am saying the center of the data (weighted score of referee in game) is about 65%. And I am suggesting that if we question data that is more than twice the MAD (which is 5%) away from this central value, we may be able to find outliers.

Getting any clearer?

I don’t know where in the world you call home, so I don’t know if it is too late or not. So, another philosophical detail someone else might want to “argue” (in the scientific sense) with me.

Everton-West Brom ended in a scoreless tie. Last season, after Everton replaced Moyes with Martinez, they were pushing for a top place. And the talk from Everton supporters, was that they were now a top team, and would be competing for the title next season. Well, we are about half way through the next season, and Everton is firmly in the middle of the pack, which is nominally where they typically are. While it is posssible that Everton did transition to a top team, and this year is just a negative outlier; it seems more likely that last year was a positive outlier (a lucky year) for a mid table team.

There are owners, who want to become top of table teams, who may have made the decision that because Everton are now a top team, that they should go out and start buying better players. Would have better players changed Everton’s position this season? I don’t have a crystal ball. Nothing is sure in that regard, but to buy better players would have increased debt.

Southampton are having a good run this season. Did losing a bunch of players they thought they needed and bringing in a new manager somehow turn them into a top team? I don’t know. It is a more complicated circumstance than Everton, as all Everton changed was the manager. And Everton is by no means decided by even this season.

I enjoyed our win over man city. It proved that that if a team needs to disrupt our tempo, they have literally chops our midfielders.

I always against buying a destructive DM as he usually can’t create chance too. Our blend of Coaq plus Santi ‘s counterattack made us so unpredictable .

Sad that many of our opponents retort to dirty tactics to win. Fat Sam started it ans the rest had followed. PMGOL ihad to make a choice in my opinion . beautiful football or ugly football in EPL.

Thinking back to looking for insight, not numbers. What does it mean for a mid-table team last season and this season, “flirt” with Top-4?

I am not thinking elitism in the slightest, so don’t get on that stream.

Football has a significant amount of random chance (luck) in its makeup. For a team that might normally finish 10th (middle), to flirth with 3rd or 4th after a significant part of the season, means something.

Top of the league flirts with 90 points, and the relegation zone is typically about 40 points. Which without actually looking at where 10th normally is, I will guess it is about 65 points. For a mid table team to flirt with top of the table, random chance has to be worth about 20 points to the plus side, and possibly a bit more to the minus side.

Hence, at the half way point of the season, you should fuzzy every team’s standings by about 10 points plus and minus, to get a guess at what range is reasonable for finishing the season. And based on that, Arsenal lies in the 2nd to 10th range. Being 5th isn’t where we want to be, but based on past performance and perhaps a swing in that luck, I don’t think anyone should be too surprised if we get mid 80s for points. Chelsea could theoretically get to 100, but will they win every game left?

You have to play the season to find out. Not just be declared winner by all the muppets.

If Chelsea were to lose every game left, they could end up in something like 12-14th.

One can always hope. 🙂

Nice work, Gord.

Eyeballing the data, I figure you add about 20% to your median and u/l limits and you have the bias against estimates. To complete the picture, we could add a reference for a standard (50% bias).

Cyberian

I doubt this is where Andrew wanted this to go. 🙂

But, I have to teach people to do things more robustly, as there is so little data to work with. For me to do things more rigorously, I need to run models over many trials, just to get some data. So I am guessing at some numbers for handwaving reasons.

I am doing this, while trying to get other stuff done.

You have come up with yet another point of view, and I am scrambling to figue out what you are doing. Could you put some more words in please?

Hmmm, no Jambug back, or an explanation from Cyberian about where this 20% comes from.

—

The scores are not counts, they are the result of dividing one count by another count and applying a somewhat arbitrary weighting factor. But, I am going to treat them as count like. A binomial distribution for a probability of 0.5 is symmetric, and for large enough N can be approximated fairly well by a Gaussian. A binomial (well, beta distribution) for probabilities close to 0 or 1, is not symmetric. So, if I pick a model that occassionally generates guesses that are below 0 or greater than 100, I have to ignore those trials as they are not physically posssible.

The median weighted score we seen in the first 9 matchweeks was 65%, so I generated a bunch of data scores distributed by Binomial or Poisson models. The binomial is said to be under-dispersed compared to a similar Poisson (of the same mean), it has a smaller variance. In 100,000 trials (of both binomial and Poisson), I seen 24 trials where a Poisson trial had 1 or more of the 9 random values being larger than 100. In 100,000 trials, 206 of the Binomial trials resulted in a MAD of 0 and 35 of the Poisson trials resulted in a MAD of 0.

It is wonderful for me to expound on the virtues of the median absolute deviation as a measure of distribution width, but not very helpful. I did note that MAD is smaller than the standard deviation, and there are ways to use the standard deviation in detecting outliers. But unless one knows how much smaller the MAD is, you are out of luck. Well, the amount MAD is smaller, in general depends on what model one is working with.

For the 2 particular models (both with a mean score of 65) I abused here, the theoretical standard deviation is very close to being twice as large as the MAD. To 2 decimal places, for the Binomial it is 2.00 and it is 2.03 for the Poisson.

Some statistical work makes use of the range of data, but this is often frowned upon. I will see if I can get the inter-quartile of the absolute deviation (75 percentile – 25 percentile). Well sort of.

We are drawing 9 scores, so the median of a sorted set of scores is the 5th score. The same statement is true of a sorted set of absolute deviations.

The midway point between the either end, and the median is 5 data points, hence the median of that half-range is the 3rd sorted point (from either end), so the inter-quartile range is about the 7th value minus the 3rd value. With absolute deviations, the smallest absolute deviation is 0. In order for the MAD to be 0, we must have at least the first 5 sorted absolute deviations being 0. Hence, if MAD is 0, the inter-quartile range will just be the 7th sorted value of absolute deviations.

Do you become blonde when you lose hair? Can’t write and think at the same time.

The inter-quartile range is from the parent data, not the absolute deviation data.

Just on my way by. I did similar calculations to the weighted scores, for the other things here. I guess I might as well add them into the verbiage.

But, to get some inter-quartile range information into the discussion, and another thing.

Instead of running for 100,000 trials, I ran for 1,000,000 trials. Instead of getting 206 instances of the MAD being 0 for a Binomial trial, I get 2140 (about 10 times as many). Instead of 35 for Poisson, I got 317 (about times 10), instead of skipping 24 Poisson, I skipped 199.

The theoretical standard deviation (or variance) for any given trial is the same for every trial, and is just a number. The value for MAD is discrete and must be an integer. Dividing a number by an integer, and then summing you do approach some kind of number. The discreteness likely influences the histogram.

In the case of calculating the sample variance and standard deviation, because there are only 9 components to the sum, there is a discrete aspect to both of the calculations. To divide one discrete number by another discrete number, quantization influences what answers are possible. To then find the average over 1,000,000 trials, you can end up with different numbers.

While the theoretical standard deviation seems to be about twice as large the MAD, the same is not true for the sample standard deviation (after correcting for finite sample size). That ratio is about 1.82. The difference between these 2 sets of numbers is bigger than I was expecting.

The reason to look at a different set of trials, was to bring in the inter-quartile range. It looks like the ratio of the inter-quartile range (IQR) for either the Poisson or Binomial models used here, is about 1.145 times the theoretical standard deviation.

For people who want to check things assuming Gaussian statistics and tables, you now have 3 different ways of estimating the standard deviation: calculate the sample standard deviation from the raw data, estimate the standard deviation from MAD, and estimate it from IQR.

As MAD and IQR are non-linear functions of the input data, estimating SD from them is not likely to be correlated with the sample standard deviation calculation.