Gameweek 9 Preview
This week's preview piece is going to be in two pieces. The first piece will be the actual forecast for the week and a very high level explanation of what the numbers mean. Lower down you'll find the mechanics behind the forecast along with some identified weaknesses and some proposed questions as to where we go next with the model.
First then, here is this week's forecast:
xG Shots Expected goals based on shot data for both the highlighted team and their opponent to date
xG Shots Regressed Expected goals based on shot data for both teams, this team regressed using league average shot conversion rates.
The Nuts and Bolts
This is the end result of a project I've been working on for a couple of weeks now, and it was discussed at length in an earlier piece. Now the data has actually come together though, I wanted to take the opportunity to run through an example so as to (a) help everyone understand where I'm coming from and (b) I find that explaining something to someone else helps you identify issues or errors in the logic. As I've said before, this is still a work in progress and I'm well aware that there are still issues to resolve, some of which I'll address below. I welcome any suggestions below, on Facebook or on Twitter.
Given my love affair with Arsene Wenger, and the fact that Arsenal generate elite numbers which should illustrate some ideas nicely, let's use the example of Arsenal welcoming QPR to the Emirates as our example:
The first step is to try and forecast the number of shots a given team will have this week, split between those taken inside and outside the box. There are a number ways one could do this but where I've started is to look at how a team has done against opponents to date compared to the league average against that team. An example of this can be found in table 3 here. Though it's tough to analyse this data in a vacuum, I will include the full tables below to see exactly what I'm working with. The first table shows the average rate which each team is exceeding/trailing the league average in terms of generating shots while the second table shows the same trends on the defensive side of the ball (note that in the second table a negative number is a good thing):
As we can see, Arsenal's attacking numbers are a bit odd as they've outperformed the league average in every category apart from shots inside the box at home which they've been slightly below average at. I would suggest this is purely due to small sample size issues rather than an underlying trend but that kind of trend could be problematic were it to continue.
We can see from the second table that QPR give up 12 shots inside the box (SiB) away from home and thus if we apply Arsenal's premium/discount of -4% we get a forecast of 11. We then look at how many SiB Arsenal are averaging at home (12) and apply their opponents premium/discount (16%) for how many SiB they are giving up away from home, arriving at a forecast of 14. For now, I then take a simple average of these two amounts to get the expected SiB for Arsenal, though this might need tweaking if we can establish whether a team's attack or their opponent's defense has more impact on this number.
The same logic is then followed for shots outside the box (SoB) until we have an expected number of total shots for Arsenal (19) split between those in (13) and out (6) of the box.
The next part of the calculation is where the divergence in the first two graphs stems from. The first graph is derived using Arsenal's shot conversion rates i.e. the rate at which they convert shots into goals, again split between those inside and outside the box. Once again, these numbers are a bit tough to read out of context, but I'll include them below again for reference:
So now we simply take Arsenal's expected SiB (13) and apply the above conversion rate (12%) to get 1.5 expected goals. We then perform the same calculation for SoB (6 x 6%) to get 0.4 goals giving us a total expectation of 1.9 goals for the week.
The point of the second graph is to eliminate the potential volatility of these conversion rates by using league averages. You can make the argument that some, or indeed most, of these rates make sense with top teams like Man Utd and Chelsea ranking well while teams with poor or inconsistent strikers coming in towards the low end of the range. However, after just eight weeks there's clearly going to be some significant movement to come, and ignoring it completely leads to the somewhat perverse ranking this week which sees Chelsea come out as the best play of the week despite facing a good opponent, largely because of their incredible 24% conversion rate of SiB.
The second graph therefore uses the league average rate of 15% for SiB and 4% for SoB which in Arsenal's case will actually improve their forecast (remember, despite the terminology being a bit confusing compared to how we usually use the word, regression to the mean can be better thought of like conversion rather than regression in a negative sense). In truth I think the best answer probably lies somewhere in the middle of the two graphs so I'd be minded to check both before making any key decisions. The extent to which teams' conversion rates need to regressed and when they stabilise is a project I hope to undertake soon.
So there it is. As I say, I welcome any suggestions and the next step is to get involved in Shots On Targets' analytics project to help take these models to the next level. I am almost finished with the individual player model too so hopefully we'll have some captain data by this time next week too. I'm currently in the final skirmishes of that battle, deciding how to calculate an individual player's historic conversion rate (for example, how do we treat Demba Ba's data from his time at West Ham? How about his seasons at Hoffenheim?). Input and suggestions on that model are also welcome, though they may be best saved for next week.
As always, thanks for reading and for your patience in helping me put together these next generation models. I've had a busy few weeks so the standard fantasy pieces like reader questions have fallen by the way side a bit, but I hope to resume that service very soon.
First then, here is this week's forecast:
xG Shots Expected goals based on shot data for both the highlighted team and their opponent to date
xG Shots Regressed Expected goals based on shot data for both teams, this team regressed using league average shot conversion rates.
The Nuts and Bolts
This is the end result of a project I've been working on for a couple of weeks now, and it was discussed at length in an earlier piece. Now the data has actually come together though, I wanted to take the opportunity to run through an example so as to (a) help everyone understand where I'm coming from and (b) I find that explaining something to someone else helps you identify issues or errors in the logic. As I've said before, this is still a work in progress and I'm well aware that there are still issues to resolve, some of which I'll address below. I welcome any suggestions below, on Facebook or on Twitter.
Given my love affair with Arsene Wenger, and the fact that Arsenal generate elite numbers which should illustrate some ideas nicely, let's use the example of Arsenal welcoming QPR to the Emirates as our example:
The first step is to try and forecast the number of shots a given team will have this week, split between those taken inside and outside the box. There are a number ways one could do this but where I've started is to look at how a team has done against opponents to date compared to the league average against that team. An example of this can be found in table 3 here. Though it's tough to analyse this data in a vacuum, I will include the full tables below to see exactly what I'm working with. The first table shows the average rate which each team is exceeding/trailing the league average in terms of generating shots while the second table shows the same trends on the defensive side of the ball (note that in the second table a negative number is a good thing):
As we can see, Arsenal's attacking numbers are a bit odd as they've outperformed the league average in every category apart from shots inside the box at home which they've been slightly below average at. I would suggest this is purely due to small sample size issues rather than an underlying trend but that kind of trend could be problematic were it to continue.
We can see from the second table that QPR give up 12 shots inside the box (SiB) away from home and thus if we apply Arsenal's premium/discount of -4% we get a forecast of 11. We then look at how many SiB Arsenal are averaging at home (12) and apply their opponents premium/discount (16%) for how many SiB they are giving up away from home, arriving at a forecast of 14. For now, I then take a simple average of these two amounts to get the expected SiB for Arsenal, though this might need tweaking if we can establish whether a team's attack or their opponent's defense has more impact on this number.
The same logic is then followed for shots outside the box (SoB) until we have an expected number of total shots for Arsenal (19) split between those in (13) and out (6) of the box.
The next part of the calculation is where the divergence in the first two graphs stems from. The first graph is derived using Arsenal's shot conversion rates i.e. the rate at which they convert shots into goals, again split between those inside and outside the box. Once again, these numbers are a bit tough to read out of context, but I'll include them below again for reference:
The point of the second graph is to eliminate the potential volatility of these conversion rates by using league averages. You can make the argument that some, or indeed most, of these rates make sense with top teams like Man Utd and Chelsea ranking well while teams with poor or inconsistent strikers coming in towards the low end of the range. However, after just eight weeks there's clearly going to be some significant movement to come, and ignoring it completely leads to the somewhat perverse ranking this week which sees Chelsea come out as the best play of the week despite facing a good opponent, largely because of their incredible 24% conversion rate of SiB.
The second graph therefore uses the league average rate of 15% for SiB and 4% for SoB which in Arsenal's case will actually improve their forecast (remember, despite the terminology being a bit confusing compared to how we usually use the word, regression to the mean can be better thought of like conversion rather than regression in a negative sense). In truth I think the best answer probably lies somewhere in the middle of the two graphs so I'd be minded to check both before making any key decisions. The extent to which teams' conversion rates need to regressed and when they stabilise is a project I hope to undertake soon.
So there it is. As I say, I welcome any suggestions and the next step is to get involved in Shots On Targets' analytics project to help take these models to the next level. I am almost finished with the individual player model too so hopefully we'll have some captain data by this time next week too. I'm currently in the final skirmishes of that battle, deciding how to calculate an individual player's historic conversion rate (for example, how do we treat Demba Ba's data from his time at West Ham? How about his seasons at Hoffenheim?). Input and suggestions on that model are also welcome, though they may be best saved for next week.
As always, thanks for reading and for your patience in helping me put together these next generation models. I've had a busy few weeks so the standard fantasy pieces like reader questions have fallen by the way side a bit, but I hope to resume that service very soon.
Comments
Good article. Essentially says Stoke and Arsenal are most likely to get a clean sheet, right?
I have a feeling that this analysis is going to be important in how the FF fans make decisions in the future.
My main suggestion is for us to have a chance to recap. Over the last 5/6 Game Weeks you have been able to present predictions but I have yet to see how accurate they have been and this, to me, is crucial in the creation of your model going forward. Maybe an other member of this community could offer to do this (me included).
However, and most importantly, we should all remember it is possible that shots and shots on target, in or out of the box are not the only things that translate to FF points. Surely points are what we are interested in.
The fact goals in games skew the points so much is almost a shame.
Again, I think this is great bit of work and I hope I add rather than discourage.
Looking at the regression issue it seems to me that the elite sides are too heavily discounted by using the league averages. Is it better/possible to use last season's goal coversion rate? This way there would be a large sample size with realistic goal conversion rates.
Should it be a factor, or just an anomaly of the small data set?
I suspect the latter (especially as their rate last year was well within bounds), but it might be worth noting if we see a strong conversion rate from Everton this weekend.