So you want to get started in football Analytics.
written by: Jamie Kilday
Collectively, as Modern Fitba, and individually on our own blogs prior to forming we have all received many e-mails, tweets and DM’s from people wanting to know one thing.
How do I get into analytics?
So rather than replying to each one individually, and for those people who want to know but have never plucked up the courage to ask - here is a blog covering how we got into it and some tips.
Before giving any advice it’s probably best to throw the question right back at you and ask;
What do you mean?
Analytics is as broad as it is long, so it might be worth drilling down on what it is you want to do and what you want from it. Do you want to be an analyst as a hobby for your own amusement and not sure what tools you need to do it? Do you want to write about analytics but don’t know where to start? Do you want to go into analytics at club level and don’t know what you need to be taken seriously? There is a lot of scope for going into a lot of areas so maybe have a rough idea of what you want. Don’t get me wrong though, don’t be put off getting into analytics because you don’t know what you want from it, I’m merely highlighting that there are a lot of options and each has its own dependent skill sets although there are overlaps.
Anyway here’s a few things to consider no matter what you want from it.
Find a question you want answering and investigate.
This is probably the best starting point. Eventually you’ll have a question that nobody has the answer to and then the fun starts. Inspiration can come in many forms, sometimes I just look over data just to fact check pundits as they constantly make wild claims about players that with not much digging can be dismantled. I’m looking at you...(insert your own pundit here, there’s too many to even mention).
My first real foray into analytics was when I wanted to know how xG could be used to calculate the expected winner of a game. That went on to me creating an xP (expected points) table for the SPFL last season. The data I compiled went on to form other ideas and the rest is history.
Find your data and make sure it’s trustworthy.
The internet is full of data so again it depends on what you want. WhoScored is good for data for most of the big leagues, and some league’s put a lot of event data (e.g. passes, shots, possession...) on their official sites. Understat is a good public source for xG and shot location data but the leagues are limited. Dave at Stratabet has generously offered to give away their data for free but with expected caveats, you can read more about that here.
Also our very own Jason of The Rangers Report fame is offering his data for the 2017/18 season for a reasonable fee, a free trial of his database can be found here.
If all else fails and you can’t find the data then you can always track the data yourself. Shot data is relatively easy to track as most highlight packages contain all the shots so you can get the data you need manually for free. It also means you can spot some glaring errors made by some of the worlds leading data trackers.
Read more than you write.
You will read a lot of crap, this isn’t necessarily a bad thing, the more crap you read the less crap you’ll write. The good stuff that you read will hopefully inspire a new idea and lead on to writing more interesting pieces, the bad can sometimes inspire you to write about the same subject but better. Theres a reading list at the bottom of the article which will hopefully give you an idea of what good looks like.
Once you have read enough and you feel like you can write a good piece on a given subject then you can either contact one of the thousands of blogs to see if they accept contributors or better still start your own blog. It’s most likely you won’t get paid for contributing to someone else’s blog so self-publishing isn’t going to hurt your pocket. Contributing to a well established blog can have other advantages as it may give your piece a wider audience and also having someone look over your piece before publishing it isn’t going to hurt either.
The more you write, the more likely someone is to take notice of your work. I’m not saying there is no money in amateur analytics it’s just worth knowing that you will probably have to invest a lot of time with no short term rewards before someone offers you some.
I should add that Modern Fitba isn’t currently accepting contributors at the moment while we get all our shit together before the start of the 2018/19 season but there is nothing to say this won’t change in the future.
Learn to code (or at least be proficient in Excel).
So there is a bit of snobbery around this subject which mostly stems around Excel. I’d say it mostly depends on what you want out of it but Excel isn’t a bad starting point if you want to play around with data and it’s pretty user friendly. You can create tables and graphs and have your data set out nicely but eventually you may find excel a bit limiting. That said there are plenty of good contributors to the analytics community who can’t code so don’t be put off because you can’t. Tableu is a good middle step between excel and coding and is well worth exploring. Coding is almost limitless and although it may take a while for you to get proficient at coding you’ll find that once you are it’s easier to customize things than excel.
Google everything and look beyond the first page.
Can’t find the data? Google it. Can’t get your data to work? Google it. Not sure if your idea has been done before? Google it.
You might not find the answer to your question but there is no telling what you will find.
I can’t tell you how many bookmarks I have that I have absolutely no idea how I found them originally but it’s a lot. You might also start looking for something and then find yourself 97 pages deep into a forum that hasn’t been read since 2013 and find that someone has a complete data set for what you want just sat there.
Don’t forget to quote sources though. Just because you found it doesn’t mean it’s yours.
Other search engines are available apparently, if you google ‘search engines’ you will find more.
Don’t be afraid to fuck up.
We’ve all done it, we’ve thought we were on to something, checked over everything to make sure that it isn’t a fluke, wrote 1000 words on how we are a genius for discovering the next new thing and then BOOM, 30 seconds after you have published it, someone on twitter tells you exactly where you fucked up. That’s okay, engagement is good. If someone has gone to the time to read what you have written and pick out the flaws then they might also be able to give you advice on how to improve it. They might be a troll though but that’s the internet for you.
There’s nothing to say that you might not still be on to something so keep at it.
Don’t give up because you didn’t get the answer you expected.
Never forget that statistics is a cruel mistress and will tempt you in to fail at every opportunity. You may get to the point where you have an idea and the data seems to corroborate with it but you only have a small sample size so you get to work on improving the sample size and looking for repeatability. Six months later your partner has left and taken the children and you’re eating pot noodle with your hands wondering why the world hates you.
Exploring an idea is good but don’t be afraid to also let go of an idea if it isn’t working. You may feel like you’ve wasted your time but you might find that the skills learnt will be used somewhere else or the data you’ve collated will go towards another new idea so don’t burn it to the ground. Dismantle and rebuild.
Hopefully, that’s answered your question, if not then WHAT DO YOU WANT!?
To avoid reading too much crap, below you’ll find a reading list of articles that we have found useful so fill your boots and godspeed on your new adventure.