Analytics is older than you think: (re)introducing Charles Reep
How Englishness and the power of collaboration relate to an analytics pioneer
A brief note before starting: I previously moved my other newsletter, Mark’s Notebook, to a different platform due to Substack’s issues with fostering a transphobic atmosphere. I plan to move this one too at some point (don’t worry, it won’t have an effect on subscribers) but am still considering options.
A blog by Sam Green from 2012 is often credited as kicking off the ‘expected goals era’ of football1, but the first xG calculations are surprisingly older. As in, ‘almost as old as the Premier League’ older.
The adage that there’s nothing new under the sun probably holds true for this earlier work as well2, but check out this diagram from a 1997 paper. Look familiar?
This ‘weighted shots’ work began with a mere observation that shots taken close to goal and central went in more often than shots from range, but the calculation grew from there. In the paper, ‘Measuring the Effectiveness of Player Strategies at Soccer’, a number of factors were considered in the further analysis: distance from goal; angle to goal; first-time shot or not; less than a yard to the nearest defender or not; shot from open play or a set play. As the caption above indicates, kicked shots were also separated from headers.
My friends, this is, in some ways, more sophisticated than some of the earliest xG models from the ‘expected goals era’ that would come a decade and a half later. The authors of the paper: Richard Pollard, and one Charles Reep.
Charles Reep has a history even muddier than expected goals. He was indisputably a pioneer of data collection, starting in 1950 with a pen and a notepad (and sometimes, at night games, a miner’s helmet). He worked with Brentford and then Stan Cullis’ Wolves in the 50s, a decade in which the Midlands side won three league titles.
But Reep’s loudspoken articles in the 1960s, principally in World Sports magazine, looked like he was misinterpreting his own data. They highlighted how many goals were scored from short sequences of possession, but implied that this made short sequences more effective without noting that short sequences were primarily just more common.
Since then, his name has become associated with what happens when data analysis can go badly.
The work of Reep’s career in totality is fascinating though, and his position within the English game says something not just about it but about us. We’ll get to that 1997 paper again soon, but let’s take a brief step back to where it all began.
Reep is often referred to as an RAF veteran, usually with his rank of Wing Commander, but he was an accountant by trade before joining the military. That does make all of this data-recording make a bit more sense, the accountant airman.
He was inspired by a talk given by then-Arsenal captain Charlie Jones in 1933 about Herbert Chapman’s tactics, but it was 1950 that he first started creating data. This being the mid-twentieth century, Reep had to physically be at the games to do this, although was later able to record data from major matches via the magic of television.
This physical necessity, and the ‘completeness’ of the collecting, are unlike modern collection practices of course. A 1968 paper3 which uses some of the data gives an insight into its limited breadth — 12 Wolves games from the 1953/54 season; 15 miscellaneous games in 55/56; two full seasons of Sheffield Wednesday (while working for them); 18 miscellaneous games… It goes on like this through the whole 1953-1967 period.
It’s slightly difficult to know exactly who, in the professional game, Reep worked with during his life. He definitely did work for Brentford, Wolves, and was full-time with Sheffield Wednesday for three seasons after retiring from the RAF.4 He met Graham Taylor several times, rather than a more formal collaboration, but has been erroneously attributed as working with Cambridge United, Sheffield United, and Wimbledon.5
So while Reep wasn’t a string-pulling svengali of Ye Olde English long-ball football, his public writings perhaps made it easy to think he was (several people who were inspired by him did work more extensively in football too, but more on that in a moment).
While in the 21st century the ‘analytics community’ was more aligned with possession-based football, Reep was anything but. Thanks to a 2018 Duncan Alexander tweet, we can read one of Reep’s World Sport articles in full (Alexander bought the magazine as part of writing his 2017 book Outside the Box).
“Reflection upon the significance of all this,” Reep wrote in 1962, referring to his data, which was reproduced in a table in the magazine, “must cause one to question many tactics now widely praised as ‘good football’.” The article ends in a series of questions, culminating in this:
Have British observers been deceived for years by too readily accepting the assurances that “Continental style” football is superior to the English direct style (as it was before 1953*) and therefore has to be imitated?
Statistics prove the answer to all these questions is definitely YES.
* [I think by ‘as it was before 1953’ Reep is referring to the English style prior to that date]
Over 40 years later, Reep’s analysis was critiqued by two foundational books for the modern analytics ‘movement’, David Sally and Chris Anderson’s The Numbers Game, and Jonathan Wilson’s Inverting the Pyramid. The latter book gives the most straightforward and succinct summary of the problems with what Reep’s analysis appeared to be:
If, as [Reep’s] figures suggest, roughly 80 percent of goals result from moves of three receved passes or fewer, but 91.5 percent of moves consist of three received passes or fewer, then it surely follows — even within the unsubtle parameters Reep sets out — that moves of three passes or fewer are less effective than those of four or more.6
However, recent years have seen Reep’s influence on English football massively overstated. Christoph Biermann’s 2019 book Football Hackers, says “[i]n England, Charles Reep’s long-ball dogma influenced the game for a number of years even though it was based on wrongful interpretations of the data.”7 One 2016 FiveThirtyEight article ran the title ‘How One Man’s Bad Math Helped Ruin Decades Of English Soccer’. There’s a danger that we slip into thinking that Reep is responsible for England’s long-ball history and tactical insularity.
His analysis was faulty, but any institutional ruination of English football doesn’t lie with the accountant airman. There’s an argument that we could blame Charles Hughes, who was inspired by/ripped off Reep’s work (depending whether you asked Hughes or Reep) and was the FA’s director of education and coaching between 1983 and 19948. It seems likely he had an influence, but (particularly given that I’m affording a nuanced take of Reep here) it seems a stretch to lump all the blame on Hughes.
This is partly because, importantly, it’s clear from Reep’s own experiences that he wasn’t alone in his beliefs on the game.
The previous quote from his World Sport article, about ‘Continental possession football’, could have been written at any point from then, in 1962, to now, in 2021. Back in the 1950s, Wolves manager Stan Cullis already believed in the long-ball approach that Reep waved a spreadsheet at to back up. Reep surely approached Graham Taylor in 1980 for the same reason. Presumably he recognised a kindred spirit (who might listen to a man trying to introduce statistics into football) and figured that might be a rare opening to work in the game that so clearly fascinated him.
England has always been welcome to the type of football that this pioneer of analytics was proposing. However, we should note that it’s not like English football ever totally succumbed to it. In 1983, an article in The Times which was Graham Taylor’s Watford, as well as the Charleses Hughes and Reep, notes:
There were not a few insults inside the Cottage [Craven Cottage], directed from Fulham’s supporters among a 22,000 crowd at Watford’s tactics, which are persistently breaking with currently accepted thinking, weathering the criticism from such as Keith Burkinshaw [Tottenham manager] and Malcolm Macdonald [Fulham manager] and doing very nicely, thank you.
The article later introduces Charles Hughes with the sub-clause “whose theory turns what is supposed to be the fundamental principle of Liverpool’s success, possession football, on its head.” At the time that the journalist, David Miller, was writing, Liverpool were on course to win their sixth league title in eight years, a period of time that had seen them win three European Cups already. A fourth would come in 1983/84.
English football, therefore, knew the success that a more possession-oriented style could bring. It’s not as if any of the ‘successful’ long-ball teams, such as Wimbledon (who beat Liverpool in the 1988 FA Cup final) who popular either. England didn’t embrace them as a proud embodiment of the nation’s footballing philosophy.
So no, Charles Reep didn’t ruin English football. And so we enter the second half of his career — the co-author of academic papers — and approach the groundbreaking work we saw earlier.
If it’s hard to know who Reep worked with in the professional game, it’s not exactly easy to know what work he did on the papers either. The first appeared in 1968, then another in 1971, and some more in the intervening years until the one this post opened with in 1997.
Those two early papers are tables of data more than anything. From a modern viewpoint they look quaint, but — given that both are now half a century old — they’re absolutely foundational.
The first (Reep partnering with Bernard Benjamin9) is primarily a presentation of Reep’s data around length of passing moves and which passing moves led to shots. Pass moves got less and less frequent with each added pass; goals from moves starting in the final quarter of the field accounted for 50+% of all goals; things like that.
The 1971 paper (with Benjamin and Richard Pollard) was more academically mathematical in output, noting that the distribution of passing move lengths (1 pass, 2 passes, 3 passes, etc) seemed to have a ‘negative binomial’ pattern. The pattern isn’t really important in itself, but it’s statistical knowledge about football being committed to the record, and that is important.10
The culmination, in 1997 (when Reep was 93), wasn’t merely a precursor to expected goals. From his early days analysing the game, our accountant airman had noted that obtaining possession of the ball in different areas of the pitch had a big impact on chances of scoring. The 1997 paper expanded on this, introducing a (very) rudimentary kind of expected possession value system, based on where a sequences of play began. Pollard and Reep called it the ‘yield’ of possessions.
Here’s an excerpt from the paper: “In terms of probability, the yield of a team possession is the estimated probability of scoring a goal minus the estimated probability of conceding a goal, based on the outcome of a possession.” A yield of 0.025, they write, would mean that 1000 possessions starting in that particular zone would lead to 25 more goals scored than conceded.
The paper even included some analysis of different tactical approaches for different types of possession:
The sample sizes are, perhaps, quite small but consider that this is 1997. This was over a decade before something as simple as possession percentage became a statistic of discussion in the public realm. A number of the people reading this newsletter will not have even been born when this was written.
Why didn’t it have more of an immediate impact? Possibly availability of the paper; possibly availability of large datasets to the people who were likely to have read it. One of the main takeaways I have from reading this work — and other early analytics research — is that a lot of the ideas existed before, but the data and technology to allow the exploration of that idea might not have. History usually takes its course on the technology side, and professional data collecting operations sprung up too.
However, there’s another takeaway I get from the story of Charles Reep that can have a more immediate impact regardless of where history is: the significance of collaboration.
I’m currently reading The Innovators by Walter Isaacson; the book is subtitled ‘How a group of hackers, geniuses and geeks created the digital revolution’ but Isaacson has a deeper motive than just history-telling. In the book’s introduction he writes:
This is the story of these pioneers, hackers, inventors, and entrepreneurs — who they were, how their minds worked, and what made them so creative. It’s also a narrative of how they collaborated and why their ability to work as teams made them even more creative.
The tale of their teamwork is important because we don’t often focus on how central that skill is to innovation. There are thousands of books celebrating people we biographers portray, or mythologise, as lone inventors[…] But we have far fewer tales of collaborative creativity, which is actually more important in understanding how today’s technology revolution was fashioned.
I think the same may be true of Charles Reep.
While his sometime co-author Richard Pollard fiercely defends Reep in a 2019 paper11, the former accountant/Wing Commander didn’t do himself any favours in his early ‘60s articles. The papers co-authored with Benjamin and then Pollard contain much more focus, and much more considered insight. Reep may well have brought some of this himself12 but it seems reasonable to say that his co-authors also brought something to the table (Bernard Benjamin became president of the Royal Statistical Society in 1971; Richard Pollard completed a PhD in statistics applied to football analysis in the 80s).
From the time I entered into the football analytics sphere around 2013, little groups of people have always been around advances. The StatsBomb blog brought a lot of people together either to write or at least to read under one roof. Companies like Opta and Prozone/STATS (now jointly Stats Perform), Hudl, 21st Club (now Twenty First Group), Decision Technology (and others I’ve probably missed) had groups of people working on problems. American Soccer Analysis is probably the best current, public example of collaboration feeding creativity and analysis.
Individuals may produce good work on their own, but it tends to be when they gravitate to discuss their ideas with others that these get refined and improved, made applicable to the game or pushed even further.
Charles Reep was a pioneer of performance analysis and a driven, methodical individual. But while driven and methodical individuals can do good work on their own, it’s usually when they have someone to bounce their ideas off that it becomes great work. A 1997 paper containing rudimentary expected goals and expected possession value calculations…?
That’s great work.
Directly referenced, main body
Sam Green, ‘Assessing the performance of Premier League goalscorers’, OptaPro blog (April 2012) [link to version on Stats Perform’s current blog here] [Wayback Machine link to OptaPro’s old site, albeit a 2015 version, here]
Richard Pollard and Charles Reep, ‘Measuring the Effectiveness of Playing Strategies at Soccer’, The Statistician, 46, no. 4 (1997), pp. 541-550
C. Reep and B. Benjamin, ‘Skill and Chance in Association Football’, Journal of the Royal Statistical Society, 131, no. 4 (1968), pp. 581-585
Duncan Alexander tweet [with photos of a 1962 Charles Reep World Sports article] (2018), link here
Duncan Alexander, Outside the Box: A statistical journey through the history of football (2017)
David Sally and Chris Anderson, The Numbers Game: Why everything you know about football is wrong (2014)
Jonathan Wilson, Inverting the Pyramid: The history of football tactics (2008)
Christoph Biermann, Football Hackers: The Science and Art of a Data Revolution (2019)
Joe Sykes and Neil Payne, ‘How One Man’s Bad Math Helped Ruin Decades Of English Soccer’, FiveThirtyEight website (October 2016 – NB: writers don’t write headlines) [accessed May 2021, link here]
David Miller, ‘The possession game turned on its head’, The Times, 3 February 1983, p. 20 [link to archived edition here]
Walter Isaacson, The Innovators: How a group of hackers, geniuses and geeks created the digital revolution (2014)
Richard Pollard, ‘Invalid Interpretation Of Passing Sequence Data To Assess Team Performance In Football: Repairing the tarnished legacy of Charles Reep’, The Open Sports Sciences Journal, 12(2019), pp. 17-21.
Directly referenced, footnotes (if not previously referenced)
Wikipedia, ‘Expected goals’, link here
James Maw, ‘No, seriously: what the heck is expected goals (xG)?’, FourFourTwo (November 2017 issue) [link to online version here]
James Tippett, The Expected Goals Philosophy (2019)
Richard Pollard, ‘Charles Reep (1904-2002): pioneer of notational and performance analysis in football’, Journal of Sports Sciences, 20 (2002), pp. 853-855.
Keith Lyons, ‘The Long and Direct Road: Charles Reep’s analysis of association football’, reproduced on his blog here – the original paper was written in 1997, but this reproduction and caveats in 2011/2012.
Not directly referenced
Alan Campbell, ‘Don’t Shoot the Messenger: The first football analyst was a pioneer 50 years ahead of his time’, Nutmeg Magazine, issue8 (2018) [accessed online 2021, link here] [Note: Campbell was founder and editor of The Punter magazine which published articles from Reep in the 1980s and 90s]
Charles Reep, ‘The great Magyar myth exploded’, The Times, 29 May 1982, p. 18 [link to archived edition here]
Barney Ronay, ‘Grim Reep’, When Saturday Comes, issue 196 (2003) [accessed online 2021, link here]
Øyvind Larsen, ‘Charles Reep: A Major Influence on British and Norwegian Football’, Soccer & Society, 2, vol. 3(2001), pp. 58-78
This wasn’t the first work done on xG in football (related: footnote ), but I would argue that Green’s blog does hold this status. Part of this is probably because of its place on Opta’s blog pages — which have a larger prominence and reach than other blogs and papers — and of who’s principally been telling the history of it (i.e., them).
Take this from Opta’s head of editorial Duncan Alexander in a 2017 FourFourTwo article as an example: “Opta first came up with the concept of expected goals when one of our data scientists – Sam Green[…]– devised an analytical model based on similar things being done in American sport”.
I don’t begrudge Opta for this, although the people doing work around 2009-2012 might feel otherwise, but it’s seeped into the story of the stat. In the 2019 book, Football Hackers: The Science and Art of a Data Revolution, this is how xG is introduced: “‘Expected Goals’ was invented by the Englishman Sam Green, who first described the idea in 2012.” The book The Expected Goals Philosophy, a book specifically about expected goals and emerging as the central (only?) pop-science book on the metric, says similar: “In April 2012, an analyst called Sam Green posted a blog article on the OptaPro forum[sic*] […] introducing the idea that shot quality might be just as important as shot quantity.”
*This may be getting confused with the ‘OptaPro Analytics Forum’, an annual conference (since 2021 known as the Stats Perform Pro Forum); I don’t believe that OptaPro’s blog was ever a ‘forum’ in the sense of being set out similar to how e.g. Reddit is. [link to Wayback Machine recording of the original blog post, albeit from 2015]
The Wikipedia page for ‘Expected goals’ (link here) notes a 1993 paper that investigated the effect of artificial pitch surfaces. This appears to have been a use of the phrase ‘expected goals’ more than an ‘expected goals’ model, however. There are a number of works between 1997 and the 2012 Sam Green blog, including two works from 2004: one by Jake Ensum, Richard Pollard (a co-author of the 1997 paper), and Samuel Taylor; and one in ice hockey by Alan Ryder.
C. Reep and B. Benjamin, ‘Skill and Chance in Association Football’, Journal of the Royal Statistical Society, 131, no. 4 (1968), pp. 581-585.
See Keith Lyons’ 1997 paper on Reep, ‘The Long and Direct Road: Charles Reep’s analysis of association football’, a reproduction is on his blog here.
Jonathan Wilson, Inverting the Pyramid; Richard Pollard, ‘Charles Reep (1904-2002): pioneer of notational and performance analysis in football’, Journal of Sports Sciences, 20 (2002), pp. 853-855.
‘Chapter Eight: The English Pragmatism (1)’
Inverting the Pyramid, ‘Chapter Fifteen: The English Pragmatism (2)’
Benjamin would become the president of the Royal Statistical Society in 1971
It wasn’t just football where negative binomial distributions appeared, the 1971 paper found these patterns in cricket (runs scored), ice hockey (goals), baseball (runs per inning), and to a lesser extent in tennis (length of rallies) too.
Richard Pollard, ‘Invalid Interpretation Of Passing Sequence Data To Assess Team Performance In Football: Repairing the tarnished legacy of Charles Reep’, The Open Sports Sciences Journal, 12 (2019), pp. 17-21.
The people I’ve read who personally knew Reep all speak quite glowingly of him, although it seems feasible that a degree of this may be defensiveness of a friend whose intellectual reputation has taken a battering.