The world of Sports has been moving in with Statistics for some time. The success of films like Money Ball are testament to that. This is to say nothing of simply watching sports, commentators are never too far from a handy or at least interesting Stat to illuminate the importance of any match to the onlooker. The post-match Analysis shows are never short on one or two statistics that undeniably prove why Team A lost, or Team B weren’t able to push through.
From here, Football (soccer) is certainly not the worst offender, US Sports in particular seem to be obsessed with statistics, especially in Baseball. Without a doubt, Baseball is a world rule by Statistics. At any rate, these still have a place in Football, and as I have been watching the Euros 2020, I’m going to discuss that here. It should be said, that for future readers, I am writing this after the semi-finals and prior to the finals so these observations will be one match out of date. I’ve also used theanalyst.com extensively here as a resource. So go check them out, if you want to learn more.
The first thing I want to point out is just the rather intriguing trends regarding precisely when goals are scored. Notably, there is a 68% increase on goals in the second half, and that brings a wry smile from me. Given something I’ve often felt is that the game doesn’t really begin in the 2nd half. There have been a lot of teams who simply just exist in the 1st half but all of a sudden come out firing in the 2nd, appearing to emulate much more closely the team of experts we expect from them.
And that is all too clear in the above graph with the increase in the 2nd half to match it.
The predictive model supplied by theanalyst.com ironically had Italy in 6th place, and England in 9th. The irony only increases with Spain sitting at position 3 and Denmark 8. This model simulated the matches 40 000 times and came up with likely group stages to determine the overall likelihood of different winners.
In defence of the model however, pre-tournament performance is rarely even a great indicator performance as those two seem to have a rocky relationship. The central stage of a large competition seems to encourage upsets, and the strength of a team can be hard to predict based off relatively low-stakes qualification games, not to mention Friendly matches !
The other aspect to mention here is that the predictive model used a pre-determined path for the teams, one which saw England face tough opponent after tough opponent. Something which was not predicted. After seeing off an uncertain Germany they have faced fairly minor nations until they face Italy in a couple of days in the grand final. Here lies an additional query though, how can you account for Home-field advantage in a predictive model ? That is to say, how can you quantify it ? Is the advantage only for games you play there, or does it work also knowing that if you make the finals it will be played with that Home advantage ?
To be sure, I do not have answers to these questions, but they are certainly worth considering when trusting a predictive Sports model, in particular when building one.
Enough said of the general predictive model, let us pour over the individual player statistics.
The first one that peaked me was “Total Expected Goals From Sequence Involvement” – here, out of the top 10, 6 of them are Spanish players. This should show a bias to this statistic straight away, and this is essentially the main critique of all of the different statistics I’ll be looking at below. It says a lot more about how a team plays than information pertaining to the individual themselves. It can also be indicative of a couple of “easier” opponents that they may have faced along the way. Luis Enrique’s men have become notorious for slow build-up , possession-heavy play, many intricate passes across a lot of the team that then have no end product (lacking a great striker that this system desperately calls for).
The only other two who hail from the same team is Harry Kane and Raheem Sterling, playing for England, who play a similar style of possession-heavy and slow build-up play, and so there isn’t too many surprises to be gleamed here. All of which is to say, that what this statistic really illuminates is how Spain and England play, and how effective they are with that. The biggest problem being that 6 players can be credited with crucial involvement with the same 5 goals, inflating their influence and the power of their side.
Another interesting statistic that I want to dive into here is the “Expected Goals” and how that relates to Goals themselves. For example Patrik Schick from the Czech Republic scored 4 “open play goals” with an expected goal tally of 1.62. Thorgan Hazard however scored 2, with an expectation of 0.67. Where my confusion begins though is why Patrik with 15 shots, and 9 on target would have such a low expected goal tally, even more so for Thorgan, who has 4 shots, 3 on target, and 2 goals, why would that equal to a 0.67 tally?
This however, is something where the methodology behind the statistic is something which I need a far more thorough understanding of before I really dig into this.
There are a couple of other mentions, honourable perhaps. Passes Vs. Passing Accuracy and Goals Prevented Vs. Shots faced.
The former, given that most of the highest passes are defenders doing safe passes to wing backs and midfielders, a high accuracy rate doesn’t mean too much, whereas those from a forward within the attacking third is truly instrumental in winning a game. And so this statistic can lend itself to what is essentially misinformation.
The later shows that it is really hard to prevent a lot of goals if you haven’t faced many shots, and it is easy to prevent a lot of shots if you face many, many shots but they are not close in or of great quality. That is to say, if you are facing long range efforts and looping headers, this is very different to saving Free kicks and short range low strikes heading into the corner of the goal.
And so, what can be made of all this? I think the answer to that is rather straight-forward. That statistics are good at showing in general how players perform, but they also bias to make teams who play a certain way look good, and teams that play another look weak from one angle, but great from another.
Quantifying play into statistics is a noble effort, but something that requires a little more work and interpretation than one might expect initially. There are countless examples to verify that testament.