A fab early-February for football analytics aficionados
Big media, big announcements, digested
February 3rd & 4th 2021 were big days in the football analytics-sphere.
It started off with a new piece on The Athletic about running stats. We don’t usually get running data in the public domain, but Sportlogiq — via the charts and texts of Tom Worville — had enough to keep us going for many a month.
The other big media piece came the following day, with Ashwin Raman’s spot in the BBC
Then outside the media sphere there were two BIG announcements from what could probably be fairly described as the two biggest event data providers in the space: StatsBomb and Opta (giving their names in chonrological order of public messaging).
Both were announcing a live, virtual event in the middle of March (what a coincidence). For StatsBomb it’s a product launch; for Opta (proper name, Stats Perform) it’s the annual analytics Forum. Here is a link to the StatsBomb announcement; and here is a link to info on the 2021 Stats Perform (née Opta) Pro Forum.
Both are pretty big; big enough to be worth a newsletter. Here’s why…
Again, I stress, the ordering here is purely based on the chronological order of their public announcements.
In December, StatsBomb teased ‘StatsBomb 360’, and this week’s post was revealing what it was. It’s a new data offering, and if you filter out the mentions of the words ‘ground-breaking’ and ‘industry-changing’ what you get is actually as conceptually simple as it is exciting.
The way that event data is usually collected is that some human people sit in front of a screen, watching football, and click their mouse and press buttons when things happen. StatsBomb’s original ‘revolution’, upon their launch as a data company, was their shot freezeframes: when a shot was taken, you wouldn’t just get the information about the shot, you’d get information of where every player in view was. See the below example from a blog from the DTAI Sports Analytics Lab
In practice, this might’ve proved most useful from StatsBomb’s point of view as a way of improving an xG model, rather than its application within clubs, but hey ho. It was genuinely quite exciting, and probably quite important.
This info isn’t collected by the human people sitting in front of the screen though. It’s gathered through computer vision software, a phrase which sounds unfamiliar but is surprisingly Ronseal. You ‘show’ an image to a computer and it sees things — in this case blobs of pixels that are football players and pitch markings.
StatsBomb is now expanding this computer vision-based freezeframe tech to “every event we collect - approximately 3300 events per match”. Readers of this newsletter may already have joined me in guessing this, given that the following paragraph appeared in the “Where will analytics go in 2021” post from New Year’s Eve:
StatsBomb have dropped their own teases, including the below image from an internal hackathon, which looks a little like some kind of passing option snapshot with cover shadow (or something). Or, as Ted Knutson says in the article, an illegal soccer rave.
I was off the mark in a sense: they weren’t just taking freeze frames of pass events, but all of them. Tackles, clearances. All of it.
360 is obviously very exciting for data-minded people. For a long time, data-ites have looked towards tracking data — info of where every player is at every millisecond of the match — as the panacea to all problems. Then some of them got their hands on tracking data and realised that it wasn’t.
What StatsBomb seem to be offering with 360 is the middle ground between event data and tracking data. (To give others their due, a middle ground that other companies, like Sportlogiq, have trodden, and that Stats Perform are somewhat on their way to too). The StatsBomb announcement post lists a bunch of stats that they can get out of this data, like line-breaking passes and my favourite acronym of the year: Defensive Island Events (DIEs).
Some clubs might find this useful. I’m sure they’ll love line-breaking passes and ball receipts in space. The devil will be in the (pricing and product-packaging) detail.
StatsBomb are pushing this, understandably, but it’s overshadowing something else that’s re-mentioned in the 360 announcement. It’s something that I think might be the more visible influence on the data landscape to those of us outside of the professional team environment. Live data.
If you read this newsletter, you know who Opta are. Part of that is because they’re a longstanding data provider with a reliable and sensible data offering. But a lot of it is because they offer live event data. Why is that important? Because data coming in the following day is little use for broadcast and online media.
Sky and BT Sports (the main UK broadcasters) put up stats in-game all the time. I don’t think there’s any real chance that they’d use a provider whose data wasn’t updated live if a provider with live data was available. And while Football Reference has gained a ton of traction in its short, StatsBomb-powered lifetime, an advantage that the Opta-powered WhoScored has is that you can see the stats of games that are in-play or just finished.
StatsBomb won’t suddenly be plastered all over Sky just because they offer live data — league coverage and accuracy/reliability will also be important — but they can operate in that most-visible of spaces now.
This brings us to the lineup of the 2021 Stats Perform (née Opta) Pro Forum. I gotta say — as someone who’s been to several of these and helped present last year — this slate of presentations sounds incredibly good.
It seems like one, coincidentally, harks to StatsBomb’s 360 announcement.
“Enriching event data: A semi-supervised augmentation approach using location information” will be presented by Debangan Dey, Rahul Ghosal and Atanu Mitra, and here’s the blurb:
This presentation introduces a method for utilising tracking data to extract more information from an event-only match dataset. Using a merged dataset where both tracking and event data is available, this project will take a semi-supervised approach to creating predictive models that capture hidden patterns from within this dataset with the objective of drawing inferences for tracking data in an event-only dataset.
It’s using tracking data rather than computer vision, but ‘enriching event data’ could be StatsBomb 360’s tagline and it’s a lovely coincidence to see the two announcements share this theme. (Naturally, it also says a lot about where analytics is headed, and maybe I’ll write about why analytics is headed this way some other time).
It’ll be really interesting to see what this group draw out in the presentation, particularly to see whether enriching event data with tracking data can bring out valuable insights which computer vision freezeframes can’t.
Next in the announcement we have Caterina De Bacco with one of the club-led proposals: “Identifying and evaluating the efficiency of each player during the pressing phase against an opponent’s controlled build-up play”. (I won’t give the blurbs for each of these — though recommend reading them through the link)
Stats Perform had two ideas for presentations that came from people working within clubs, and the specificity of the phases of play is an indication of that. Again, this’ll be fascinating to watch, because breaking down analytical insights into the different phases of play seems like one of the big ‘next steps’ for using data in football.
Another big ‘next step’? Defending! And that’s what Aditya Kothari is doing with the presentation “A physics based measurement of defensive contributions”. This one is near to my heart for the subject matter, so check out the blurb:
Focusing on pass and carry prevention and shot prevention, Aditya will build on existing pitch control modelling work with the aim of identifying how well defending teams and individual defenders perform in particular situations during a game, identifying weaknesses and lapses in the defensive system and picking up on other unusual occurrences.
Of course, collecting metrics is good, but the next presentation on the list, from Ola Lidmark Eriksson, seeks to look at how volatile key performance indicators are. Titled “Volatility and calculation of risk-adjusted return in football scouting”, it’ll take inspiration from the financial sector* to see which players are more consistent than others.
*Mention of the application of financial practises to football is a good chance to plug friend-of-the-newsletter Tiotal Football’s newsletter, Absolute Unit
Rounding out the list of five presentations is Vignesh Jayanth, with “Identifying and evaluating strategies for successfully penetrating a high opposition press from short goal kicks, played inside the box, to move the ball into the opposition half”.
This is the second of the club-led proposals. Not only is this Vignesh’s second year presenting in a row, it’s the second successive year presenting one of the club-led proposals. He’s one of those rare smart people who is as much into knowing about football as data science.
Finally, there’s the Dr. Garry Gelade Award winner, named after the late early leader in the field and fixture of Forums, given to recognise an outstanding submission from a university undergraduate. Laurynas Raudonius’ findings “Recognizing and evaluating opportunities in counterattacks using tracking data” will be shown as a virtual poster during the event.
The predominant theme of this year’s Forum is tracking data, but another is specificity into phase of play. Add StatsBomb’s announcement into the mix and the themes stay pretty similar: going beyond event data, and specificity into phase of play. One of the metrics that StatsBomb’s 360 post mentioned is “Defensive shape around every event”. It’s not clear what shape that will be in, but it sounds like ‘this pass was against a settled defence, that pass was in transition’.
Hell, add in Tom Worville’s piece and the themes are exactly the same:
Going beyond event data (in-stadium(?) tracking data for Stats Perform, freeze frames for StatsBomb, tracking data from broadcast footage for Worville and Sportlogiq)
Specificity into phase (for Worville’s piece, the types of running information he’s looking at give a nice indication into how different players might be used in different types of phases)
And then maybe the article on Ashwin Raman adds in a third, which might come into sharper focus amidst this pandemic: it doesn’t matter who or where you are to work in this field.
Raman is a teenager in Bangalore; some of the Stats Perform Forum presenters hail from India, the US, and mainland Europe; Sportlogiq are a Canadian company, and Worville himself started off, much like Raman, as a guy on Twitter; StatsBomb was a blog before it was a data provider.
You do need the skills required — skim through the list of Forum presenters to see how many of them are working full-time in data science somewhere — and I won’t pretend that the acquisition of these skills is a perfectly fair meritocracy.
However, there was a time when professional football was too club-like to admit even the predominantly white male analytics nerds. Professional football is now, increasingly, not so club-like anymore, and the analytics sphere isn’t quite so restricted in who’s involved. It’s not perfect, but the realisation of that third theme felt quite inspirational to me and I hope — for those just recently approaching it — football analytics is a welcoming space.
Thanks for reading. If you’ve enjoyed this then feel free to subscribe, although Get Goalside! only goes out irregularly nowadays. If you’d like writing of mine that is on a more frequent release cycle I have another newsletter, Mark’s Notebook, that goes out on Tuesdays and Thursdays and I write a weekly blog — usually out on Thursday or Friday — for work at Twenty3 Sport.