By Tony Attwood
It was the story that Sky used years and year ago to discredit Arsenal and its manager Mr Wenger.ย We were the dirtiest team in the League.ย ย Unfortunately the source of their data was never given, nor any other team’s figures, so no one could check.ย But seveal people challenged the figures and ultimately Sky gave up on them and moved on to something else to attack Arsenal with.
Now we are back with the Express claiming this is the case, counting downwards for reasons that failed to become at all clear to us. And naturally the coattail followers know an antiArsenal story when it slaps them round the face and so are reprinting it with glee.
Counting backwards, for reasons that are never made clear here is their top ten of dirty teams showing the number of fouls.
 10. Tottenham โ 86
 9. Leicester โ 87
 8. Burnley โ 88
 7. Huddersfield โ 88
 6. Crystal Palace โ 90
 5. Manchester United โ 90
 4. Bournemouth โ 96
 3. West Ham โ 96
 2. Everton โ 102
 1. Arsenal โ 103
And they tell us who is responsible: “Granit Xhaka is among the most hardhitting players in the league and has racked up 14 fouls alone forย Arsenalย this season.”
Hence the headline:ย Arsenal news: Gunners ranked as DIRTIEST team in the Premier League this season.
So we thought, as always, before we repeat this, as so many other sites are now doing, we’d check the figures.ย And there the problems start – there is no clear reliable source quoted by the Express.ย ย Just as they never told us the source of the baby’s toy found on the surface of Mars story, so the tale of Arsenal fouls is not sourced.ย So it is over to us…
The best source for this sort of data in the Premier League’s own site, since they have access to PGMO records, but they don’t show fouls.ย However they do show yellow cards and that data gives us…
Rank  Club  Yellows 

1. 
West Ham United

22 
2. 
Brighton and Hove Albion

18 
3. 
Southampton

18 
4. 
Watford

18 
5. 
Burnley

16 
6. 
Crystal Palace

16 
7. 
Arsenal

13 
8. 
Fulham

13 
9. 
Huddersfield Town

13 
10. 
Leicester City

13 
It is a bit early for red cards but even so some clubs are really trying their best
Rank  Club  Stat 

1. 
Leicester City

3 
2. 
Everton

2 
3. 
Manchester United

2 
4. 
AFC Bournemouth

1 
5. 
Cardiff City

1 
6. 
Crystal Palace

1 
7. 
Huddersfield Town

1 
8. 
Newcastle United

1 
9. 
Southampton

1 
10. 
Watford

1 
But we don’t figure in that chart at all.ย ย However if we have a look at yellow cards by player again we don’t find Xhaka doing too badly as the figures from the Express might imply…
Rank  Player  No of yellows 

1.  Josรฉ Holebas  5 
2.  Harry Arter  4 
2.  Etienne Capoue  4 
2.  Zanka  4 
5.  Trent AlexanderArnold  3 
5.  Cรฉdric Soares  3 
5.  Ryan Bertrand  3 
5.  Philip Billing  3 
5.  Dan Gosling  3 
5.  Harry Kane  3 
5.  Mario Lemina  3 
5.  Jefferson Lerma  3 
5.  Matthew Lowton  3 
5.  Harry Maguire  3 
5.  Sokratis  3 
5.  Matt Ritchie  3 
5.  Oriol Romeu  3 
5.  Luke Shaw  3 
5.  Robert Snodgrass  3 
5.  Dale Stephens  3 
So nothing thus far is backing up the whacky claim of the ever eccentric Daily Express.ย But there is one more list we can check.ย Tackles.ย ย Fouls can’t easily be committed without tackles, and although that is not a perfect measure we might expect Arsenal to be top of the tackles list if they are top of the fouling chart.
Rank  Club  No of tackles 

1. 
West Ham United

158 
2. 
Everton

155 
3. 
Watford

155 
4. 
Leicester City

153 
5. 
Southampton

153 
6. 
Newcastle United

152 
7. 
Crystal Palace

149 
8. 
Tottenham Hotspur

147 
9. 
Huddersfield Town

145 
10. 
Liverpool

145 
11. 
Brighton and Hove Albion

141 
12. 
Wolverhampton Wanderers

138 
13. 
Cardiff City

129 
14. 
Fulham

128 
15. 
Arsenal

127 
16. 
Burnley

119 
17. 
Manchester United

102 
18. 
Chelsea

101 
19. 
AFC Bournemouth

95 
20. 
Manchester City

88 
So again no go, and all told there is no supporting evidence from the PL at all.
But there is one other source that could be used: WhoScored.ย They do provide detailed statistics including fouls, but I didn’t turn their first because we have found that their figures just don’t seem ally themselves with those from other sources.ย We did use them as a source for two years, but stopped it about three or four seasons ago.
Now they do have Arsenal fouling a lot more than other clubs – but not all other clubs.ย Here is their data and yes Arsenal come out second in fouls per game.ย So not quite what the Express says, but there is that figure.
But there is a problem. If Arsenal were committing this many fouls per game surely they would be getting more yellow cards.ย Brighton does follow this rule.ย ย Second in the yellow card table and top of the fouling table, but Arsenal were seventh in the yellow table, nine yellows behind the champions of the yellow so far this season: West Ham.
Figures from WhoScored
R  Team  Shots pg  Tackles pg  Fouls pg  Offsides pg  Rating 

1  Brighton  16.9  17.6  13.5  2.6  
2  Arsenal  14.6  15.9  12.6  2.3  
3  Newcastle United  15.5  19  12.4  1.1  
4  Watford  9.6  19.4  12.3  2.4  
5  Southampton  14.5  19.1  11.9  1.5  
6  Tottenham  12.1  18.4  11.8  1  
7  Manchester United  10.4  12.8  11.5  2.6  
8  Wolverhampton Wan  11.6  17.3  11.4  1.4  
9  Huddersfield  13.9  18.1  11.3  1.3  
10  Liverpool  8.1  18.1  11.1  2.5 
So even going to a source that we have had doubts about, we still can’t agree that Arsenal are the dirtiest team.
And here’s one final point: why did the Express refuse to quote their source?ย It is rather like all those figures which show that referees are 98% accurate in their decision making.ย No one ever quotes their source and gives the data – except Untold in its 160 game analysis which showed that the error rate was far higher.
Arsenal the dirtiest team?ย Certainly we can find no research to show that, and every bit of research we have shown except one (Who Scored on fouls) Arsenal are not showing statistics that you might expect to be associated with being the team committing the most fouls.
It is therefore most likely to be a totally invented story.ย But that won’t stop it being repeated over and over and over again.
Unfortunately the Express and the rest of the rags write this shit to get attention.
And you just gave it to them.
First off I haven’t a clue were any of these statistics come from. I’ve had a look on the Premier League site and they give the following statistics including fouls:
Arsenal
https://www.premierleague.com/clubs/1/Arsenal/stats?se=210
Fouls 61
Yellow cards 13
Red cards 0
Spurs
https://www.premierleague.com/clubs/21/TottenhamHotspur/stats?se=210
Fouls 55
Yellow cards 8
Red cards 0
I have metioned here on Untold a few times that back in the day I used to do a very rudimentry equation using the stats above to support the notion that Arsenal were being harshly treated by referees.
I gave 1 point for a yellow. 2 for a Red. Then divided the number of fouls by the points accumulated for cards.
For example using the 2 sets of statistics above:
Arsenal
61 fouls divided by 13 points = 4.69
This is an extremely harsh ratio of cards to fouls but alas is perfectly normal for how we get refereed in the Premier league.
Spurs
55 fouls divided by 8 points = 6.8
From when I used to do this on a regular basis that is about average.
Where those numbers come from for fouls (Arsenal 103) commited I do not know because they bare very little resemblance to those produced on the premier league web site.
My apologies if my stats are wrong.
But right or wrong it is irrelivent with regards to what is happening, because it is simply the medias usual ploy of casting Arsenal in a bad light.
Plastering how dirty Arsenal are all accross the back pages is there usual crude, but often effective way, of getting it accross to the referees and public in general that Arsenal need to be clamped down on.
Dont be surprised to see even more cards flashed in our direction including Xhakas obviously much overdue Red Card.
Not sure where some of those stats come from but these are what the premier League give for us on there web site:
Matches played 8
Wins 6
Losses 2
Goals 19
Goals conceded 10
Clean sheets 2
Discipline:
Yellow cards 13
Red cards 0
Fouls 61
https://www.premierleague.com/clubs/1/Arsenal/stats?se=210
Something sems well askew with some of those stats in the papers. Or maybeit’s me.
And they tell us who is responsible: โGranit Xhaka is among the most hardhitting players in the league and has racked up 14 fouls alone for Arsenal this season.โ
He’s not even top at Arsenal where according to the Premier League web site Lacazette is our ‘dirtiest’ player with 16 fouls.
Xhaka is bellow Pogba amongst others.
https://www.premierleague.com/stats/top/players/fouls
Casting Arsenal as ‘dirty’ is the medias standard ruse to justify the exorbitant amount of yellow and red cards we as a matter of course get throughout a season.
On the back of this expect a red card for Xhaka sooner rather than later.
Footnote:
Some of these stats I’m seeing on the PL website seem a bit odd but there you go, I can only go by what I see.
Regarding fouls again.
These are the fouls attributed to each player at Arsenal and these add up to 101:
https://www.premierleague.com/stats/top/players/fouls
These are the fouls attributed to each player at Spurs and these add up to 94:
https://www.premierleague.com/stats/top/players/fouls
All very odd, but as I say, as long as they can find something to paint us in a bad light, however tenuous, who cares about the facts.
Had another look at the PL site and if you look at the individual player stats for fouls, Arsenals adds up to 101 and Spurs to 96.
All very odd.
But no matter, as long as Arsenal can be painted in a bad light who cares?
I’m currently working my way through the PL stats on fouls, cards etc. I have looked at every game played this season and extracted the times of all cards noted on the game timelines as printed on the official Pl website, I have then come to the stats page for each game and extracted the number of fouls recorded as being conceded by the home and away team. Using those figures I can report that Arsenal have had more fouls awarded against us than any other club. We have conceded a total of 102 fouls, 56 at home and 46 away.
Watford and Crystal Palace are next with 100, Everton have 99, Man United 98, Brighton 97 and Huddersfield 96. At the other end of the scale Man City have been penalised for 70 fouls, Chelsea 71, Bournemouth 78, Fulham 78 and Wolves 80.
In terms of cards, Chelsea have fewest at 5 (2 home and 3 away, Man City have 8 (all away, as do Liverpool. We are in a group in mid table with 13 cards. At the high end of the scale West Ham have 20, Watford and Southampton have 19, Brighton 18 and Palace 16
City are the only team not to have picked up a yellow card at home despite 34 fouls, they have 8 away yellow cards on 36 fouls. Possibly some kind of home bias in play there..
What these stats take for granted is that the referee is invariably right in all of their foul and card decisions, something that anyone who has looked at football matches will have the greatest of difficulty in accepting.
The summary stats on the PL website don’t always agree with the data given on the timelines. I have assumed the timelines are probably correct for my dataset.
Just looked at the cards per foul. Chelsea get a card every 14 fouls, Liverpool every 11.5 fouls, City every 8.8 fouls (but no cards at all at home), We get one every 7.8 fouls, Spurs every 7.3 fouls, United every 7, and at the wrong end of the table West Ham get a booking every 4.2 fouls, Southampton 4.9. Watford 5.5. The median figure is 6.7 so Chelsea and Liverpool certainly look out of kilter with the rest.
Going to FootballStats.co.uk, they have a table of fouls by a team, and fouls on that team by the opposition.
The median number of fouls committed by a team is 89, with a spread (median absolute deviation) of 7. The median number of fouls on a team by the opposition is 88 with a spread of 7. Essentially those are the same number.
In terms of fouls committed by a team, the biggest “outlier” is Man$ity at 61 (4 times MAD lower than median). The biggest “outlier” in fouls committed by opposition is only 3 times MAD, for Wolves.
For both of those “statistics”, we are counting something, and hence could naively expect the variance to be proportional to the mean. So we are expecting a standard deviation of about 9.4. The mean number of fouls committed by opposition is 87.8, with a standard deviation of 10.3. The variance appears to be a little larger than we are expecting (possibly not significant).
The fouls committed by a team suggests that Man$ity is possibly an outlier. They _seem_ to be committing far fewer fouls than any other team in the league. The flipside is that they are being called less than they should be.
Man$sity is the 5th lowest number of manminutes of caution at 370 (Liverpool is 219 and Chel$ea are 220).Man$ity has only needed 4 treatments this season (according to untrustworthy statistics). And I believe that one of those 4 treatments resulted in a card to the opposition. Man$ity have “inflicted” treatments on opposition 13 times (CPalace is highest at 15, but we also have ManU and Spuds up here at 14). They have not received a card for inflicting a treatment on another team (according to my flaky data).
I think what this merged data shows, is that Man$ity is being treated specially, and is not necessarily a _cleaner_ team than any other.
http://www.footstats.co.uk/index.cfm?task=Leagues
Could I get a copy of your foul and card data Andrew? It will help me look for mistakes in my data.
Looking at the minutes played under caution and counting red cards as twice as painful as yellow cards the teams with fewest penalty minutes are Chelsea (203), Liverpool (221 and Wolves (231). Then there is a big gap to Cardiff (335 and Man City (370). We are more or less in the middle with 546. At the wrong end are Leicester (842), Watford (837) and Southampton (720). For this calculation a yellow card in the first minute would count as 90, on in injury time at half time 45 and in injury time at the end of the game 1 minute.
Gord, I was planning to send you a copy of my spreadsheet once I am happy with it. Probably after the weekend (I’m still putting in the calculation cells). It also has the who has refereed which game information.
Thanks Andrew.
Your minutes of caution is about the same as mine, a difference is I include penalties in my data.
Just looking at The (sweet) FA Fair Play League (EPL games only), people can get points for denial of a goal scoring opportunity. It should be possible to see if any team has picked up points in that category.
The points due to denial of goal should be tabulated pointN*4M*12 for N yellows and M reds. State Aid, Brighton, Everton, Huddersfield, Burnley, Newcastle, Arsenal, Fulham, Wolves, Cardiff, Spuds, Liverpool!, Man$ity and Chel$ea all come out with 0 points for denial of goal. Curiously Leicester has 4 points for denial of goal (which could be that they have one less yellow than is indicated), and Southampton, Watford, ManU, CPalace and Bournemouth all have 2 for denial of goal (how can they have half a yellow card too many?). Does The (sweet) FA not know how to add and subtract?
Andrew Crawshaw
Great work.
“Just looked at the cards per foul”.
I was doing the same thing earlier as that is something I used to do years ago to show how harshly we were treated, but I gave up as I wasn’t sure what figures were correct.
Back in the day I did it over a few seasons, and with an average of a booking around every 5.5 fouls we were always worst or near worst treated and Man Utd hovering around a card every 10 or so fouls were always the most or near most leniently treated.
Nothing to do with Fergie and his little book of phone numbers of course.
There were obviously spells when we were treated ok but as is always the case with these things, the more data over the longest time is always the best, and averaged out there was no doubt we were screwed.
When I was doing it the average as far as I recall was around a card every 7.7 fouls so your summation that we are about average at the moment seems about right.
And that is I believe part of what’s behind this ‘dirtiest team’ in the league rubbish.
Being treated in what appears to be a more even handed way is certainly not what those nasty little Hack Dwarfs expect from our men in black, and this is there not so subtle way of letting them know it’s time to clamp down on those dirty gooners, especially that Xhaka guy.
I believe over the next few matches we can expect a few more yellows coming our way as well as a red for Xhaka.
https://www.transfermarkt.com/premierleague/fairnesstabelle/wettbewerb/GB1
Transfermarkt has us at 100 fouls behind Brighton 104 and Watford but because we have few yellows and no reds we are 5th (at 13 points) on the fair play table behind: Chelsea 5, City 8, Pool 8, Sp*rs 12.
This is indeed hilarious. Arsenal have been at the receiving of many brutal repeat,brutal and injury career threatening tackles.
Who can ever forget Eduardo injury and Ramsey’s long layoff. That may be in the past.
Suffice to say its happening in 2019 . Maybe the fm was
more moderate in his tactics unlike the present boss
who may want to fight fire with fire and not with oil
[sarcasm]
I’m disappointed that people liked what Andrew worked on, but not mine. ๐
[/sarcasm]
I think people don’t like working with medians.
Wikipedia has an article on Median Absolute Deviation, and in there they come up with a scale factor to convert MAD to standard deviation, in the event that the underlying distribution is Gaussian (the multiplier is about 1.4). But the number of fouls is constrained to be an integer, so it couldn’t possibly be a Gaussian distribution. While there are places where MAD is useful, there are places where it isn’t. And the conversion between MAD and standard deviation is always dependent on what the underlying distribution is. So, not very useful when you are fishing for relationships.
The reason I proposed that Man$ity is an outlier, is that its absolute deviation (28) is 4 times MAD, while the next largest deviations are just a bit over half that large (16 and 19).
The premier league will tell you how many times a player is offside, or tackles someone; but they (seemingly) don’t have foul data. I’ve run across compilations of the descriptions of cards, which should be something from the game reports. And hence not open to interpretation. Fouls on the other hand, typically don’t show on the game report, and so they need to be guessed at by an expert observer (aka mind reader).
By and large, yellow cards get assigned to a single category. Is this laziness on the part of the referees, or is this one category actually the most likely characterization of yellow cards?
Why do most sources of data, ignore treatments? Presumably this is just another observation?
Football has a problem with simulation; which mostly shows up in two places. Diving for penalties, and trying to get opposition carded for bad tackles (ask Harry Kane about that last one).
People are using fouls, yellow cards and straight red cards for information on “dirtiness”. There are many reasons to give a yellow, and fewer reasons to give a red. And really, the “professional foul” (such as the last defender taking down a player) really should be a different colour. Red is intended to indicate a larger chance of injury. The professional foul is only intended to stop a goal scoring chance.
(And in writing that, I think I see where some of The FA’s math comes in, some reds are 10 points, and some reds are 12 points. So I have more work to do.)
It is possible to tackle an opponent and not hurt that player. Or rather, not very likely to hurt that player. It is possible to tackle a player, and very likely hurt that player. But possibly just something like a bruise. And it is possible to tackle a player in such a manner, that serious injury (including career ending or life ending) is likely.
Fans presumably are interested in who the cheats are (simulation), who is dirty, who is trying to hurt people. Treatments are a significant source of this information. Was a foul called in the incident which preceded the need for treatment? If not, was the referee aware of the foul (called play on)? Was the player taken off the field to allow for play to continue? How long was the player off the field of play? Did that player need to be substituted immediately after needing treatment? Did that player eventually get substituted? If substituted, did that player miss the next game? More?
Did players need to be substituted after treatment, who never seen a card issued? Who never had a foul called?
How often was a player seeking treatment, carded for simulation? If this never happens, why is simulation a problem?
Harry that is of course true – they want sales of the paper and hits on the web site ads. But if we ignore it this then many people will simply believe it is true. That is most certainly what happened before Untold came along – these stats came out and no one challenged them and so the myths about Arsenal were born. Given one more bit of attention for the Express against calling them out, I’ll go for calling them out everytime. You of course may choose to ignore them because it gives them attention. Up to you.
A thank you to everyone who has worked on and is working on this issue.
I’ve just spotted something else which is both curious and amusing, and I’ll put it up around lunchtime Saturday when I’ve got everything together.
You are never going to guess ….
Gord
Can I be honest with you ?
It is obvious that you are an intelligent guy and that ‘averages’ for want of a better summation, is your field, and I have absolutely no doubt that what you are saying is accurate, insightful and revealing. The problem is as soon as I read something like this from your above post:
“Wikipedia has an article on Median Absolute Deviation, and in there they come up with a scale factor to convert MAD to standard deviation, in the event that the underlying distribution is Gaussian (the multiplier is about 1.4).”
I stop reading because, to put it simply, I haven’t got a clue what you are talking about.
Can I suggest that to ensure all your hard work doesn’t go to waste you express it in what could loosely be termed as ‘laymens terms’.
Don’t get me wrong, posters like you are a big part of the reason I love untold.
Intelligent, insightful, informative.
So please continue with your analysis but try to express it in a way that idiots like me can understand.
Thanks.
Once again Untold Arsenal expertly challenges the most basic of poor journalistic licence as portrayed in the back pages of rags such as the Express, Metro and the like!
Fascinating analyses from Gord, Andrew, Nitram et al.
Gord especially highlights many issues concerning the original raw data and how it is classified (or not classified) when being drawn up and tabulated by different web sites as part of the statistical analyses such web sites offer.
Reports suggesting that the PGMOL interpretation of handball with associated penalties/nonpenalties being somewhat different to that in other FAโs, suggests that perhaps other areas of PGMO decision making can and should be be questioned and any raw data emanating from such decision making (fouls; simulations etc) need to be viewed with care. Indeed, Gord highlights the way data concerning Manchester City for instance, could lead to the conclusion that such data for Manchester City could be regarded as an outlier (for whatever reason).
Gord, Andrew, Nitram, some people do have problems when asked to view any mathematical pattern beyond a simple average. Despite this, keep up your fascinating work and analyses, if indeed you have the time. Such work and resultant challenges to the media rags and blogs are further endorsements to the need and power of such a website as Untold Arsenal.
OT slightly, I’ve cut down on the amount of football I watch this season, so I haven’t seen much but the few games I’ve seen Southampton they look like a much more physical/foul happy team. Mr Hughes influence I guess.
Maybe I noticed it more because it’s not something I expect from them as a team (only from a couple of their players), but together with Watford they appear to me to be the dirtiest 2 teams. Maybe some others are but I’ve just not noticed it so much…
Aside from their match against us, I’ve only watched one Watford match and they play like thugs on defence. I find that interesting because they play attractive football with the ball.
Awesome statting gents.
Gord , I do a bit of analysis as part of my job, but not with anything like the fluid authority of the available options you seem to have. It’s an education, and I’d miss it if you didn’t give you’re expanations.
I think that most readers, like Nitram, that find the language essentially foreign, will skim down to the conclusion. They are missing a treat, but not the point hopefully.
Something for my clarification. Fouls recorded by PL, or PGMO, are they recorded because there were subsequent free kicks, or recorded by someone, like Andrew analysing the timeline?
What happens to the advantages? I presume that offsides, which result in a free kick ,are excluded as a separate entity.
I’m not trying to be difficult or hard to understand. Sorry.
—
Since I had the program to look at linear correlation, I decided to run it against the Foul data, the caution league data, and the two treatment league data.
By and large, the fouls by a team and the fouls by opposition to the team appear to be the same, except possibly that Man$ity might not be getting called for fouling other teams.
The ratio of the largest magnitude to the smallest magnitude for most of this data is quite small, except for manminutes of caution. So, I decided to look at the initial data, and the logarithm of that number. The highest correlations I am seeing is 0.41, which isn’t very good. Many of the correlations appear to be very close to 0.
The correlation between how many fouls a team has been called for (not necessarily how many fouls they have committed) is 0.37. If I use the logarithm of the manminutes of caution, this drops to 0.34 (about the same).
The correlation between the number of fouls a team is called for, and how many times their own players need treatment is 0.26. Strangely, the correlation between how many fouls a team is called for, and how many times the opposition needs treatment is nearly 0 (0.03).
The correlation between how many times the opposition is called for fouling a team, and how many manminutes of caution they play under is 0.35. Using the logarithm, this goes up to 0.41.
The correlation between the number of fouls called on the opposition, and the number of treatments is 0.17. The correlation between how many fouls called on the opposition, and how many treatments the opposition needs is 0.31. If referees are doing their job, you would think the first should be higher than the second.
The correlation between number of treatments a team has had, and the number of treatments the opposition has had is close to zero (0.02).
The correlation between the manminutes a team has played under caution, and the number of treatments needed is nearly zero (0.03 or 0.05 for log(value)). Manminutes of playing under caution has some correlation with the number of treatments inflicted on the opposition (0.33 or 0.4 for log(value)). This is what a person is expecting.
—
A problem with football, is there isn’t enough data. So, let’s throw away some of the data. Let’s just look at the “Rest of the Pack”, except that some of the data still involves when the “rest of the Pack” played a Top6 team.
There was a near zero correlation between fouls by a team, and fouls by the opposition with all the teams included. Drop out the Top6, and the correlation becomes 0.47. The correlation between fouls by a team, and manminutes of caution falls a little when the Top6 are partially dropped out. The correlation between fouls by a team, and how many treatments they received was 0.26. Drop out the Top6 and this falls to about zero.
The correlations involving fouls by the opposition are substantially the same when you (partially) drop out the Top6.
The correlations between manminutes of caution and treatments needed by a team, or treatments required by the opposition become stronger when the Top6 are (partially) dropped out. The correlation between number of treatments a team requires, and how many the opposition requires becomes stronger (and is negative) when you drop out the Top6 (partially).
But really, none of these correlations are all that wonderful. The strongest is 0.47.
I think the EPL is a collection of two leagues: a top group (the number of members can vary, and I think is now 6) and the “Rest of the Pack”. You would think it quite likely that data mixed from two populations would more often come out near zero than just looking at the populations individually. Which is what seems to be happening. The Top6 are different from the Rest of the Pack.
This is not to say that the Top6 behaves as a single population. We’ve felt for a while that Arsenal gets special treatment. Other teams or groups of teams could be getting special treatment as well (for some definition of special).
In my tabulation of manminutes of caution, Southampton and Watford lead the league, with 747 and 767 minutes respectively. My definition of caution includes when penalties are issued and reds get double the caution.
—
The EPL does publish offsides, so that data is “officially collected”. Advantage should be treated the same as a foul, but I suspect it is ignored.
Ferg, the minor in my M.Eng. was Statistical Mechanics. I had one course from the chemistry department, one from physics, and one from engineering. And my M.Eng. project was a big statistical project. I also did work in geostatistics for a while.
But there are lots of better statisticians than I.
Dover Publications had a book by John Mandel, “The Statistical Analysis of Experimental Data” which I thought was a very useful book. He was a Statistical Engineer for the NBS (now called NIST). Copyright is 1964. Not a statistics book, but R.W.Hamming’s book on numerical methods from Dover is good (the object of computing is insight, not numbers).
Gord
Please don’t think for one second I am suggesting you are ‘trying’ to be difficult, as Ferg suggests it’s more to do with understanding the language, and that’s really my problem.
What I was suggesting was putting your conclusions into laymens terms rather than mathematical terms.
Using terms like correlations, populations seem to me to be a mathematical expressions:
“But really, none of these correlations are all that wonderful. The strongest is 0.47.”
What does that mean in terms an idiot like me would understand?
Some problems are “linear”. If we were to graph the acceleration produced by the application of various known “forces”, we would expect to see a straight line through the point (0,0) that had a slope of 1 divided by the mass of the object.
There are likely to be errors in measuring how much force we are applying, and errors in measuring the acceleration produced. But a reasonable experimenter might get a correlation coefficient squared of 0.9 (which in this circumstance would be a correlation coefficient of about 0.95. For this “linear” problem, what the correlation coefficient squared tells you, is that 90% of the variance in the data is explained by the model. The data points would fall fairly close to a straight line. If we were to look at the difference between the observed and predicted values, as a function of our independent variable, we would not see any pattern to the differences. We see about the same number of positive differences as negative. Large differences would be rarer than small differences.
If we were to just put points at random on a graph, we would expect the correlation to be near 0. In particular a linear correlation would be near 0.
Let’s take points from a parabola (for example, say we threw a football at 5 m/s on a 45 degree angle downwind where the wind was close to 5 m/s as well) from when it left my hand to when it hits the ground. I have about as many points before the ball reached its maximum height as after. If I calculate a linear correlation for this data, I will see a value near 0. But a plot of the predicted minus actual will have a distinct pattern to it. The linear correlation is inappropriate for this data.
With the data I looked at (fouls, treatments and caution), I did not look to see if a linear relation was useful model. But, if a linear relation was useful and I got a correlation of 0.47; that linear model is only explaining about 22% of the variance in the data.
Wikiversity has an interesting example at
https://en.wikiversity.org/wiki/File:Anscombe's_quartet_3.svg
Only the data in the upper left graph could be usefully analysed by the simple statistics that are commonly taught in high school or early university. All of these data sets have a linear correlation of 0.816; far better than the 0.47 from the fouls/treatments/caution data.
The upper right data looks like my parabola example. They are taking a different fraction (or subset) of the data, and getting a spuriously high correlation.
The graph in the lower left looks like there is a single outlier present. But, there are some problems which have reasonable probabilities of obtaining data points far from the expected value. Robust methods often are required to solve those kinds of problems (there are others).
The example in the lower right, is to me the result of a poor experiment. There are only 2 independent values present in the data, one of which is only sampled once. I think this kind of data (or the lower left as well), is philosophically related to the problems of analysing football data. Not enough data, data tends to be clustered, we sometimes see data out in the middle of nowhere.
—
If we were to put up a sheet of graph paper and take a shot at the paper with a shotgun, where the gun was pointed orthogonal (perpendicular) to the paper, we would (probably) get a random set of holes and a linear correlation of 0.
If we were to shoot at the paper from some angle (both vertically and horizontally), I think we might get a linear correlation in the vicinity of 0.5. (Plotting the center of each hole.) To get a positive correlation, we need to see a trend where Y increases as X increases (plotting X,Y).
I think there are a lot of studies that the medja presents to us (such as what causes cancer) which have lousy correlations. Possibly not as bad as this data (fouls, treatments and caution), but not much better.
You see Nitram, this is what I was telling those years you were skipping Statistics for Social Science classes ๐ . Just kidding ๐ .
Gideone
I was ‘skipping’ that’s for sure but not for social science, rather social lising more like.
But what are the ‘chances’ I’d end up this stupid ?
I’m sure Gord could tell me by correlating the population of people in my class that would listen to the teacher, with the population in my class that would rather be kicking a ball about.
Here’s a data set with a correlation of 0.37
1 2.07
2 6.78
3 6.16
4 3.00
5 4.18
6 8.98
7 7.21
8 0.27
9 6.61
10 7.02
The Y coordinate is equal to the X coodinate plus a Gaussian deviate with a mean of 0 and standard deviation of 4. If I generate other sets of data (new random values), I get correlations of 0.71, 0.25, 0.51, 0.63, and so on.
The true model is Y=X (Y = 1 * X + 0).
Nitram, you probably know way more about socializing than I do.
Gord
Possibly I do.
But my guess is I’m far more envious of your interlect than you are of my ignorance.
Keep up the good work my friend.
Gord
Is there any way that these mathematical exercises can be used to tie up the rest of Europe and then we can sneak in the back door with Our Brexit Plans.
Nitram
I’m pretty good with my hands and head. If I have seen something done, I can usually figure out how to do it. Mechanically speaking. Fix tractors, wire electrical, paint, split logs to make raised bed planters. At the moment, I am replacing rotten 4×4 posts with new 6×6 posts. It seems like every hole in clay is different.
Stevo
I have no idea. Europe has produced some pretty wonderful mathematicians over the years, I probably couldn’t compete with most of them.
You might have heard of one of them, his first name was Albert. Einstein was his surname.
There was a story about him being in school. The teacher wanted some time to himself, so he told the class to add together all the numbers from 1 to 100. Almost immediately Albert puts up his hand with the answer, 5050.
100 + (99+1) + (98+2) + … (51+49) + 50
You have 50 sums of 100 plus a lone 50.
Gord
“Iโm pretty good with my hands and head”
Me too.
I can cut the lawn, rake the grass, as well as change a fuse and replace a light bulb.
I can also, add, subtract, multiply and divide.
I think that covers just about all a guy really needs to know, don’t you think?
Albert who ?
I
Gord.
Magnificent,off the cuff, tutorial. I got a bit lost, as usual, and did what I normally do when critically appraising trials. Cut to the money, en route to meet nitram in the bar.
Excuse my impertinence, but , I do hope you have another job , other than resizing holes in clay. Though I’m sure they are exquisite holes. I do recall you telling me about your ‘farm’ and replanting a thousand trees, which is an impressive feat in itself, but does remind me of Marvin a bit.
I tried for too many years, to get a job in engineering (materials science and engineering). I’ve given up now. I moved back to the family farm (40 acres). The last time the land was “worked” was by me, when I was in grade 11. I have every weed known in the area growing here now, and various good for nothing trees where I don’t want them. Not that trees are worthless, but the trees I have, cannot be used for outdoor projects. I am going to plant an orchard (fruit and nuts), a hedge to deal with too many deer (which includes moose) and expand my garden (last year was about 300 square feet, next year should be over 1000 square feet). When the farm can provide the food needed to feed all that live here (me and Mom), then I will start using my materials knowledge to build things from fibreglass, epoxy, wood and electronics. Like beehives that can weight themselves and tell you if the bees are likely to swarm. Or doghouses that can keep track of how much stress the dog has in its life.
Ha. What could a massive brain , glue and superconductors achieve without ambition. Get them bloody holes dug and get creating!
Cheers
Your round Nitram.
The unusual stats for Man City do not seem odd to me. I can remember Guardiola complaining about the harsh treatment City were getting foul wise at a while back. This resulted in a private meeting with PGMO. I do not remember anything published after this meeting but we may be seeing the consequences right now.