By Tony Attwood

A while back a correspondent on Untold mentioned that he thought it was fair to expect that Arsenal should either win the league or be challenging to win the league each season. I produced some statistics to show how rarely this happened, not just with Arsenal, but with all clubs across all recent seasons.

In fact the lack of the same one or two teams at the top of the league all the time is what marks the Premier League out from the German league or the Spanish league where one (in Germany) or two (in Spain) teams most of the time divide the league up between them.

During the past 16 seasons the Premier League has only been retained three times. Given that this won’t happen this season, we can make that three out of 17. Chelsea has done it once, Man U twice (when in fact they won the league three times running). The last time it happened was nine years ago.

But I thought it would be interesting to see how many times we have had a team end up in top two (irrespective of whether they won the league or came second) in consecutive seasons in the Premier League. The answer is…

- Man U 7
- Chelsea 5
- Arsenal 4
- Man C 3

I started to wonder why this was not happening more, and in doing so also started to wonder what happens to the second placed and third placed teams.

Looking at this I pondered how often a team moved from second one season to winning the next. That turned out to be fairly common:

- From 2nd to 1st: Arsenal (twice) Chelsea (once) Man U (three times) Man C (once).

In other words a team that comes second one year has a 50% chance of winning the league the next year. But that’s all. It is just 50/50.

So what about the third placed team – can they be seen to have a good chance of going from third up to top of the league the following season?

- From 3rd to 1st: Man U, Chelsea twice, Man C once:

This shows the third place team one season, based on the figures from this century, havs a 24% chance of winning the league the following century.

Which means that teams from outside the top 3 one season have a 26% chance of winning the league in the following season.

Interestingly however having the top scorer doesn’t always help

Season |
1st |
2nd |
3rd |
Top scorer |

2001–02 | Arsenal | Liverpool | Man U | Thierry Henry (Arsenal) |

2002–03 | Man U | Arsenal | Newcastle | Ruud van Nistelrooy (Man U) |

2003–04 | Arsenal | Chelsea | Man U | Thierry Henry (Arsenal) |

2004–05 | Chelsea | Arsenal | Man U | Thierry Henry (Arsenal) |

2005–06 | Chelsea | Man U | Liverpool | Thierry Henry (Arsenal) |

2006–07 | Man U | Chelsea | Liverpool | Didier Drogba (Chelsea) |

2007–08 | Man U | Chelsea | Arsenal | Cristiano Ronaldo (Man U) |

2008–09 | Man U | Liverpool | Chelsea | Nicolas Anelka (Chelsea) |

2009–10 | Chelsea | Man U | Arsenal | Didier Drogba (Chelsea) |

2010–11 | Man U | Chelsea | Man City | Dimitar Berbatov (Man U) Carlos Tevez (Man City) |

2011–12 | Man C | Man U | Arsenal | Robin van Persie (Arsenal) |

2012–13 | Man U | Man C | Chelsea | Robin van Persie (Man U) |

2013–14 | Man C | Liverpool | Chelsea | Luis Suárez (Liverpool) |

2014-15 | Chelsea | Man City | Arsenal | Sergio Agüero (Man City) |

2015–16 | Leicester C | Arsenal | Tottenham | Harry Kane (Tottenham) |

Only in six of the 16 seasons has the top scorer been a member of the team winning the league. Now some argue that having the top scorer can only help, but even here it can be deceptive, because having the top scorer makes a club very vulnerable. If the players gets an injury the whole profile of the squad goes with one nasty foul.

But then I thought, what about interlopers? The teams from outside of the main contenders who suddenly pop up.

In terms of the number of times teams have appeared in the top three during the years shown above the results are

**Manchester United:**6 champions, 3 runners up, 3 third place: total 12 top three finishes,**Manchester City:**2 champions, 2 runners up, 1 third place: total 5 top three finishes**Chelsea:**4 champions, 4 runners up, 3 third place: total 11 top three finishes**Arsenal:**2 champions, 3 runners up, 4 third place: total 9 top three finishes.

Other clubs: across 15 years, places in the top three have been achieved by teams outside of these four on just eight occasions.

So the dominance of these four clubs across this period remains. Only eight of the 45 places noted above have been gained by clubs other than the four noted just now (Man U, Man C, Chelsea, Arsenal). Of those eight, five of the places have gone to Liverpool, and one each to Tottenham, Leicester and Newcastle.

What this shows is a remarkable consistency in the Premier League across recent years. Arsenal might well drop out of the top four this year, but in so doing they are just following what has happened to Chelsea and Man U. Chelsea bounced back, Man U have not yet.

And yet there are a couple of other pointers here:

First, although four teams have carved up the top three places between them, Liverpool have popped up from time to time and Tottenham seem most likely to get a top three place this year for a second time.

Second, although Man U have the biggest income that can be spent on players this has not guaranteed them success in recent years – they have now had three years outside the top three, and most likely this season will make that four years. Only Man C has a longer run outside the top three, not appearing there in the first nine years of the chart.

Of course it is possible that we are seeing a seismic shift in the top three scenario at this moment, and maybe Man U and Arsenal are dropping out of contention for a prolonged period, but I suspect not, for two reasons.

First, Tottenham have the stadium issue, and to stay in the top three for a prolonged period they will have to become the first club to have a major stadium change without it having an adverse effect on their playing performance. Second, Arsenal’s achievement in reaching the Champions League for a sequence that is beyond any club other than Real Madrid, might now be vanishing for a long spell, but it seems more likely to me that it is just a run, like all runs, coming to an end.

Third, although Man U have dropped off the chart for three years, they have the money, and although I hardly expect an interesting style of football (or come to that an honourable approach to people management) they can be expected to grind out results.

For what it is worth I would expect a return to the competition between four clubs for the top three spots in the very near future.

Hmmm. Man City doesn’t look like conceding any ground in the table to Arsenal at least this weekend as they won comfortably this evening at St Mary’s against Southampton to go 2nd . early today. And Tottenham Hotspurs have strengthened their strangglehold on the 2nd spot in the table by reducing the gap between them and Chelsea to 4 points with hopes that Chelsea will stutter at Old Trafford tomorrow against Man Utd.

Spurs are said to have stopped being Spursied but are now genuinely contending with Chelsea to win this season’s Premier League title. But will they become Spursy once again this season by collapsing to dropping points massively in their remaining 6 games of this season to allow Arsenal who should win 8 out of 8 to dramatically finish above them again this season? A wild dream by me?

Will Arsenal win tomorrow at the Riverside against Middleborough to stop the rut in their PL away matches from continuing and give us hopes they’ll get another top 4 place finish this season? I think they should if they don’t want to kill us.

Arsenal are at an interesting crossroads.

Statistical blip, or decline.

The manager and board need to assess and make the right decision , personally, I suspect a blip, but those who know what goes on day to day will know what is what.

Certainly need to improve things though…..still things to play for, but it looks like a nasty season , to get over and come back wiser and stronger

Crossroads.

Arsenal’s problem, and mine, are working with a time series. We have data in the past, we are interested in the future.

Short of situations like insider trading in the stock market, we do not know in advance if a situation has changed. So, knowing what has happened in the past, we read the next data point. And we need to classify it.

A simple solution is:

* this point is consistent with the past

* this point is not consistent with the past

And inconsistent points we call outliers (or we can call them that). And they can occur at any time.

Okay, we look are given another point. If we classified the previous point as an outlier, we are ignoring it (for now). If this new point is consistent with the data we had before, we assume we are free to continue thinking we are still on the old path. And we can continue to think that the previous point was an outlier.

But really, we need to look at our old data, removing the assumption that the previous point was an outlier. Is this data point still an outlier?

If the new point is an outlier as well, is it an outlier in the same direction as the point was just flagged as an outlier? If it is in the same direction, it may be that we are on the start of a new trend. If it is off in the other direction, we have a problem. The metric we are using to define outlier seems to be not working.

So we have to get yet another point and look at it. Three succesive outliers to the same side, we’ll tentatively assume it is a new trend. Outliers mixed as to which side, we wonder if our metric is bad. If the new point is consistent with the trend ignoring the now 2 points labelled as outliers, we just assume we are still on the old trend.

So, for how I’ve described it, we really need 4 successive questionable results (next data points) to recognize a strong change in trend. Weaker changes in trend will need even more data points.

For me, looking through thousands of GPS data points, 3-5 (or more) data points is only a few seconds. And I don’t need to change assumptions about what is driving my trend.

For the EPL, this is probably more than 1 month, and is a considerable fraction of a season. People get injured, and the trend line can change from that. Transfers happen, and the trend line can change from that. And there is other stuff.

A few people like myself, can sit here and look at the new data as it comes up, and look to see if something has changed. And we can wait for new data. And more new data. To whatever it takes.

But most people, don’t have the background to recognize outliers from anything else, or the patience to see that a change in trend has happened. It’s often not even a bad data point, just a data point not as good as they were betting on (and there is probably a bet there literally, not necessarily with the bookies). And they start to cry that a change has happened that must be dealt with.

But, for the people that got this far, and who have no idea what an outlier is. Let’s say you are given the job of finding out how much the average apple weighs. So you go to the apple place, and start grabbing apples and weighing them. Mackintosh, Red Delicious, Gala, Granny Smith, and so on. You move over to the next bin, and find crabapples. Typically smaller than “regular apples”, but a crab apple can pollinate a regular apple tree. So you continue taking apples and weighing them.

And you move over to the next bin. And these things are reddish (like an apple, usually, sort of) but huge. The tag on the bin says “Yellow Delicious”. These aren’t yellow, they’re green. And you’ve already measured a bunch of Red Delicious apples, and they were similar in weight to the other apples you had seen to that point.

But these new things, are much heavier than just the ‘Red Delicious’ apples (not lumped in with the others). So the similar name doesn’t seem to make this new “apple” similar to the others.

And an employee comes along, and removes the “Yellow Delicious” tag from the bin, and replaces it with “Watermelon”.

The Watermelons are an outlier. They are not apples, but somehow they got into our apple data.

A good striker is better to have than an average striker.

A top scoring striker is a better option than a Welbeck-type striker.

A top, top striker who scores many goals and then gets injured is better than a Welbeck -type striker who also gets injured.

So yes Tony, some of us believe that strikers like Yaya Sanogo, Perez, Giroud and Welbeck should be replaced by better strikers if Arsenal is to move up to the next level and compete to win the big trophies like UEFA Champions League and the Premiership title.

I would rather Arsenal had Diego Costa, Hazard, Agguero or dare I say Harry Cane than having Perez, Welbeck or Sanogo or Giroud.

Better strikers strengthen a team. Poor or average strikers waste chances and often lead to Arsenal dominating play but not winning.

In many press conferences after a loss or draw, Wenger always says we had many chances to score and win the game but we didn’t score.

Having a good striker doesn’t mean that we shouldn’t also have goal-scoring midfielders and wingers.

The Invincibles had top-scoring strikers but we also had top scoring wingers (such as Robert Pires).

When we say we need a top, top striker we don’t mean that only that striker should score and the rest of the team shouldn’t.

All the great Arsenal teams in the Wenger period had top, top strikers.

Yes, having a clinical striker doesn’t guarantee the team winning the title but any coach will tell you that it certainly helps. That’s why the best strikers, the most clinical strikers command huge transfer fees and higher wages.

Arsenal charges fans the top, top prices for season tickets so we are not out of sync if we also ask for top, top strikers, and midfielders, and defenders.

I think u misunderstand me and many people like me that want better striker at Arsenal.I personally want a top striker, he doesn’t have to be the highest goal scorer,he just has a to be a top striker.A top striker,if fit,should be among the top goal scorers in the league.Look at Messi and Ronaldo,they often alternate the golden boot between themselves,but that doesn’t make whoever misses out this year not to be a top striker,neither does it not make Greizman a top striker. Same applied to Henry, Nistelrooy and Drogba when they were vying for the golden boot.Same also applis this season among the top goal scorers.If any of Lukaku,Kane,Aguero,Costa,Sanchez,miss out,it won’t change the fact that they are top strikers because their goals return prove it.If we have a top striker that plays alongside Sanchez that should operate from the left side of attack, there should be a huge surge in goals return for Arsenal.

I ran through a small example problem earlier today (before Easter dinner).

For the points X=-4, -3, -2, -1 I had a slope of -1 with an addition of a zero mean Gaussian noise with a standard deviation of 0.1. So, a small amount of noise in the data. You expect the slope to be about -1 with an intercept of about 0. The value of R^2 was quite high at 0.998.

For the points X=1,2,3,4, I used a line of slope +1 (so the 2 lines are perpendicular to each other). And I used the same Gaussian additive noise.

At X=1 (so only a single point off trend), R^2 is still quite respectable at 0.855.

At X=2, things have now transitioned to horrible, with R^2 slightly under 0.5 and when you get to X=4, you expect an R^2 of about 0 (I got 0.0001).

But 90 degrees is a huge change in the behavior of a trend.

If instead of coming away form 0 with a slope of +1, (new trend is 45 degrees off old trend) I change it to a slope of 0, at X=1 I get R62 of 0.97, at X=2 I get 0.92, at X=3 I get 0.85 and at X=4, I get 0.78.

An even gentler change is to come off with a slope -0.5 (a 22.5 degree change). The 4 values of R^2 are: 0.983, 0.979, 0.976 and 0.965.

If I double the noise for that 22.5 degree change, I see the initial R^2 of the original trend fall down to 0.991, and then for the 4 next R^2 values on the new trend I get: 0.973, 0.962, 0.960, 0.970.

Look at that last point. Our value for R^2 is starting to increase. If whatever rules we are using to find where an old trend line stops and a new trend line starts are not good enough to find the change inside of 3 points, we probably won’t see the change in trend.

You think an aaa type is going to see that trend change in EPL game data for a team? Not a chance. They’re lucky if the they can see the train coming when they are standing on the tracks.

It’s only 9pm (Sunday) for me.

Using R^2 to try and find a change in trend is incorrect.

For all of the above work, I started by setting up the data for X=-4, -3, -2, -1 and 0 (i.e. adding the Gaussian noise to -1*X). The next point to be added, is the data for X=1.

Our “null hypothesis” is that the old slope and intercept is still the existing trend in play. The old slope is approximately -1 and the old intercept is approximately 0.

The “predicted” value for the data at X=1 is (about) Y=-1. We want to calculate bounds, such that if our new data point is outside of (about) -1 +/- bounds, we are going to say that the new point is not consistent with the old data.

The bounds calculation consists of 3 parts.

1. A critical T statistic for a certain number of degrees of freedom (here 5-2=3) and value alpha (I am using 0.05).

2. The mean square error of the data for X=-4, -3, -2, -1, 0. Or rather, the square root of that mean square error.

3. A term which includes how far our new X point (1) is from the average X data accepted so far (here -2).

So, for the data with the doubling of the noise (which influences the mean squared error of point 2), I calculate the bonds are 0.325 on either side of the predicted value (which is about -1). Our new observed value has to be less than (about) -1.325, or greater than -0.675 in order to be flagged as being different.

The last case (slope = -0.5, or a change in angle of 22.5 degrees) says that our predicted value will on average by (about) -0.5. Which is greater than -0.675. The difference is 0.175.

It turns out that the noise value I used to make the data was 0.2, which is close to 0.175.

Half (50%) of trial experiments would produce new data greater than (about) -0.5. If the difference was 0.2, then between 0 and 0.2 we find there is we see about 34% of the data. But we are not quite at 0.2, so we will see a bit less than about 34%. Let us call it 30% to save the problem of calculating it.

Hence, we are expecting that about 80% of the time, whatever new value we generate for our new trend with a slope of -0.5 will be detected by this better formula.

Finding one point that is inconsistent with the old trend is not an indication that the new point is the beginning of a new trend. All we do is flag this point as being inconsistent and we examine another data point. Point 3 is the only part of the calculation which is different, and calculating the new bounds at X=2 I find that the bounds are 0.375 (they were 0.325).

For a new trend slope of -0.5 (the example we are working with), almost 100% of the time we will detect this (or any further points) new point as _also_ being inconsistent with the old trend line. Which means we have two (or however many more) consecutive points which are inconsistent with the old trend line.

Which probably is a good reason to say we have started a new trend.

If we were to go back to the smaller noise value I used to begin with (half the size), we would find that our bound on the first point is about half the size (instead of 0.325, it would be about 0.163), which would be detected as a inconsistent much more than 80% of the time. So better data does allow us to see smaller changes in trend, more often and possibly earlier.

But using the proper tool for the job, instead of (ab)using R^2 allows us to (probably) spot the trend change. But, people with little or no knowledge in statistics haven’t a hope in doing this. They wouldn’t know that there is a proper tool. People with rudimentary knowledge might suspect there is a proper tool, but probably wouldn’t know how to find it on their own.