For our purposes, GAR is an Acronym which stands for Goals Above Replacement. It measures the total amount of goals a player adds to his team relative to a replacement level player, and tries to do so by taking everything a player does into account. Then getting it down into one number. So theoretically, if a break-even goal differential team gained a +20 GAR player, their goal differential would increase to +20.
If you're only looking for a specific section of the stat, here's a quick table of contents, if not, let's dive in.
1) The Value of GAR
2) Replacement Level
3) Even Strength Offence
4) Even Strength Defense
5) Power-play Offence
6) Penalty Differential
9) Testing and Results
1) The Value of GAR
Before diving into how it's calculated, first let's address the question, why is this important/necessary? Well, you may have scene great resources around twitter that look something like this. (Data Viz From Bill Comeau).
These charts are a great way to consume tones of information in a small amount of time. You can see tones of statistics in one glance, from their point production to their ability to drive play, and even their context, it's all there. The problem is when you have to choose between two players with similar production, like Hall and Mackinnon for example. Above you can see Mackinnon had more points, but Hall had better shot and expected goal metrics. Mackinnon played tougher competition, but with much better teammates. What about the fact Mackinnon had a better penalty differential? Or the additional face-off and power-play data that isn't even included here. With all of that, who was better overall?
You can try to weigh everything in your head, but the odds of you being able to come to consistent conclusions without being bogged down by the same personal affinities and biases that we turn to numbers to get away from in the first place are very low. This is why we can turn to GAR, as a framework for how to weight things. Also, just as a general tool, having player output in one number can be incredibly useful. It's not going to be perfect, but it's a great starting point for player talent.
2) Replacement Level
The next question around GAR model's is why are they "above replacement level"? For a simple explanation, let's assume individual goals are all a player brings to the table. Then imagine a theoretical hockey team called the Computer Boys. And during the 2017 off-season the Computer Boys lose Eric, their top line left winger to free agency. Over a full season, Eric always scores 22 goals. With Eric gone most people think that the Computer Boys have lost 22 goals from their lineup, however that's not the case. If the team was to just leave a void on first line left wing, then i guess they would lose 22 goals. But of course, they wont do that. The second line left wing will step into Eric's minutes, the third will take the second and so on. This shifting of the line-up will replace some, but not all of Eric's 22 goals. This why it's worth comparing player contributions to replacement level rather than zero.
So what is replacement level? That's a question which I don't have a perfect answer to. However for the purpose of this model it has to be something. From Rob Vollman's book, Stat Shot he cites the 75th percentile as replacement level. He also notes that will drop after an expansion draft. Since the Vegas Golden Knights made the NHL about 3% bigger, I'm using the 78th percentile as replacement level. This means you have to be better than 22% of the league at a given skill to provide above replacement level value at said skill. With replacement level defined, let's dive into the inputs that give us goals.
3) Even Strength Offence
The next part of GAR (Goals) is to try and encapsulate everything a hockey player does to help his team win games. To cover this, my WAR has 5 main inputs. Since about 70% of NHL goals are scored at even strength, most of the goals come from even strength play. Let's start with offence.
Ask the average hockey fan what players do to contribute offensively, and they will immediately cite points, which leads into the first half of the offensive equation. To account for players point production there's C-BPM. Which is built around Dawson Sprigining's BPM. Here's his inputs, weighted based on their ability to predict future goals.
This basically represents the weights for C-BPM. The longer bars show the more important inputs for each position. Some things of note here. For forwards, primary points are king. For defencemen all points are relatively equal, with slightly more focus on assists. And for either position, shot quality (ixFSh%) is slightly more important than shot quantity (iFF/60). I did make some small modifications (C-BPM likes goals slightly more relative to primary assists than BPM), but this is a good breakdown of the weights. Since the goal here is to encompass point production, I compared CBPM to points per hour.
There are a few outliers, which are mainly guys with high shooting percentages or low TOI. Other than that, CBPM lines up closely with point production and does a good job encompassing players score-sheet stats. For a quick look at who excels in this metric, here's the best C-BPM players from the 2017-18 season (Minimum 1000 minutes played).
I mentioned above that point production (CBPM) is only half of the equation. That's because there's another key input to players offensive output, their ability to drive play. Some players (especially defencemen) can be elite offensive players without massive point totals. They achieve this by driving shots and scoring chances towards the other teams net. Making sure their team generates goals even though they may not be directly picking up points. To account for players ability to drive play offensively I use C-XPM (again based on Dawson Sprigings WAR, and his XPM metric). This uses 2 key metrics, Relative to teammate Corsi for, and relative to teammate expected goals for. If you're already familiar with those metrics, the next 2 paragraphs aren't for you. If your not, here's a quick synopsis.
First is relative to teammate corsi for (RelT CF). This starts with the basic idea of corsi, which is a fancy way of saying a shot. Each shot attempt a player is on the ice for counts as a corsi for. Once we have a players Corsi for, we can adjust for players quality of teammates. For an example, let's bring back Eric. Imagine Eric generates 65 shots per hour of ice time, while his line-mates generate an average of 60 shots per hour. In this scenario, take Eric's 65 shots, minus his line-mates average of 60 shots (65-60) and Eric would have a +5 relative to teammate corsi for per hour. (The equation is more complex than this, and if your interested you can read more in depth here, but that's the general idea).
After reading about corsi everyone thinks the same thing, "but not all shots are created equal". Of course a shot from the blue-line is far less valuable than one from the slot, which is where Corsica's expected goals comes into play. Expected goals adjusts each shot for quality by taking into account the shot's type, angle and distance. This way if two players generate the same number of shots, the one who generates higher quality will be recognized as the superior play driver (the same relative to teammate formula is applied here too).
With the 2 main metrics defined, C-XPM is simple. Since it's a combination of the two metrics above, the first step is to get them on the same scale. To get corsi on the same scale as (expected) goals, think about goals as a function of shots. Every shot attempt over the past three years has had a 4.08% chance of going in, so just multiply the corsi events by 0.0408055, and suddenly it's on the same scale as expected goals. (This number will change every year, but you get the idea). Once they are on the same scale, this quick equation combines them together.
(((RelT CF/Min - RL CF/Min)*TOI) + ((RelT xGF/Min - RL xGF/Min)*TOI))
This gives is the framework for CXPM. Originally I had corsi and expected goals weighted equally, however with some testing I found XG to be way more volatile (especially for defense-men). As a result, the ratings are skewed to include both but favor corsi.
The next addition to C-XPM is a quality of competition adjustment. There are 2 different ways to look at quality of competition. First is by looking at the time on ice percentage of the competition, and the second is by their shot and expected goal metrics. Since corsi and XG can be split into offensive and defensive sides, I went with those for the adjustments here. The idea here is that the better a players opponents are at suppressing shots and expected goals, the more value is in that players shot and chance generation. Here's the formula i used to adjust.
(((QoC CA/Min - Average CA/Min)*TOI) + (((QoC XGA/Min - Average XGA/Min)*TOI)
Again this is the framework, with the weights skewed slightly to favor corsi. This adjustment is nothing major. In extreme cases it swings players results by about 1.5 goals or so. This is probably the most controversial debate in hockey analytics today, so I'm especially open to feedback on improving this part of the model, but in the meantime, the context adjustments did make the total more repeatable, which is a good sign. Again, to get an idea of who excels in this metric, here are the top 5 forwards and defencemen at C-XPM from the 2017-18 season.
4) Even Strength Defense
Even Strength defense uses 2 of the metrics mentioned above, corsi and expected goals. This time it's relative to teammate corsi (RelTCA) and expected goals against (RelTxGA). It's the same thing, just the ability to suppress shots and chances relative to your teammates rather than generate them. So the basic equation is the just the inverse everything mentioned above
(((RelT CA/Min - RL CA/Min)*-TOI) + ((RelT xGA/Min - RL xGA/Min)*-TOI))
Again i found Corsi to be more consistent than expected goals. Furthermore players don't have an the ability to influence their goalies save percentage, so preventing quantity against (Corsi) is more important than players ability to suppress quality (XG). For the context adjustment, it's more of the same.
(((QoC CF/Min - Average CF/Min)*-TOI) + (((QoC XGF/Min - Average XGF/Min)*-TOI)
This adjustment is applied the exact same way as the offensive context adjustment. Once again, here are the leaders in this metric from the 2017-18 season.
5) Power Play Offence
The next components of the model are to deal with special teams. About 30% of goals in the NHL are scored on special teams, so roughly 30% of the value in this model is distributed between these upcoming sections. First up on special teams is power-play offence. Just like C-BPM, power play value comes from point production. And to weight all of the stats, again I turned to Dawson Sprigings BPM.
Power play production has some weirder results. People generally put a premium on primary points, however based off the BPM weights being used, either type of assist provides significantly more value than a goal. This sounds counter intuitive. And while Dawson used machine learning techniques beyond my pay-grade to find these weights, I decided to look at the repeatablility of goals, primary and secondary assists per hour on the power-play and it started to make more sense. (players with minimum 50 minutes played on the power-play in each season)
So that's a lot of information, but there is one big takeaway. Many are likely puzzled with why goals for forwards have so little value, and while I can't tell you how Dawson got the weights, it's worth noting that goals per hour among forwards in recent history has been incredibly noisy. More noisy than any of the other point based metric. Furthermore, Ryan Stimpson's work illustrates that passing data is more predictive of future goals than shot data at even strength, and since power-play offence is especially reliant on passing to open up the opposing penalty kill, the gap might be even larger on the power-play. These two things help explain why assists are king on the power-play, not primary points. And once again for fun, here are the top power-play performers on from the 2017-18 season.
6) Penalty Differential
If an individual players rating looks out of whack to you, this section is likely the reason why. We love to rave about a great power-play or complain about a teams terrible penalty kill, however there is more to special teams than that. A significant part of special teams is deciding how much time is spent playing each one. For example, the 2017-18 Maple Leafs have an amazing power-play, generating the second most goals per hour in the NHL. Sadly they don't get the full benefit from that power-play because the team struggles to draw penalties, as a result, the leagues second most efficient power-play only scored the 9th most goals last season.
Of course the flip side is on the penalty kill. Take the Carolina Hurricanes, they gave up the 4th most goals per hour on the P.K. an absoutley terrible result. But in spite of their bad penalty kill, they tied for 9th least goals surrendered while short handed, how? They masked their P.K woes by being the least penalized team in the NHL.
On top of it's importance at a team level, penalty differential has proven to be repeatable at the player level, so it's an important input into the goals above replacement formula. For the model, it's split into two separate categories. First is players to draw penalties, and second is players ability to stay out of the box. To calculate the value of each, the formula's are similar
((RL PIM Taken/Minute*TOI) - PIM Taken) *0.1207
(PIM Drawn - (RL PIM Drawn/Minute*TOI)) *0.06885
Conventional wisdom from @EvolvingWild states that a penalty is worth 0.17 goals, however my weights are different. The first reason for that is using 0.17 goals results in an sub-optimal distribution of goals. Using 0.17 it resulted in greater than 40% of the goals being attributed to special teams, which is far to high, so the numbers were scaled down. It's important to note here that the raw values don't matter much, the values matter relative to each-other. I could have easily used 0.17 goals, there would just be more goals to go around in the other categories too.
The second reason they are different relative to each-other is to make up for a shortcoming of the model. I have no way of accounting for penalty kill prowess, so instead I shifted more credit to players ability to stay out of the box. It's not perfect and I'm always open to suggestions (especially about P.K. data), but since I was unable to quantify penalty killing at the skater level, staying out of the box is the best proxy to helping your team short-handed. With the equation defined here's last seasons leaders in the NHL's most under rated skill.
For the finishing touches of the model, we have face-offs. The mainstream media tends to treat face-offs as some sort of defensive WAR, and push back from people generally revolves around "face-offs don't matter" neither of which are true. Face-offs are a valuable input into a center-mans output. Micheal Shuckers did a study and found these to be value for each type of faceoff.
In a perfect world each Face-off would be weighted based off all the information listed above, however for simplicity sake focus on the top number. Altogether, it takes 76.5 Face-off wins to be worth one goal. Take 100 divided by that 76.5, and you get 1.3071895425 . From there, it's really easy to derive the formula for Face-off value. (F = Face-offs)
(F Wins - (F Taken*0.45))*0.013071895425
This way, 45% in the circle is replacement level. And the amount of value that comes from Face-offs is likely more than people expect. Since there are no defencemen here, let's look at the top 5 and bottom 5 face-off value players from the 2017-18 instead.
Turns out there is actually something Connor McDavid isn't good at. The final extra to the model is the addition of a prior to both even strength offence and defense. Meaning any single season value uses some of the previous seasons data too. This makes the model slightly worse for talking about say awards voting, but gives a better idea of players true talent. Altogether, the inputs I've described are added together to make GAR!
As I've mentioned before, any GAR model, including mine, are going to have weaknesses. And it's important to understand those weaknesses when working with the data. First is the inability to quantify short handed impact. Ian Tulloch's zone start adjustment is a step in the right direction, but I was still unable to come up with a penalty kill metric that adds anything but noise. So for this model, the best penalty killers just stay out the box.
The second one is about the prior. When a player has no prior history (rookies) their history is considered to be average. This by definition means that the value of a rookies season will be under rated if he's above average, and over rated if he's below average. Also with the prior, if a player dramatically outperforms their past performance, they will be scaled back towards their career norm. This helps find "true talent" because generally, the more extreme a players results, the more likely luck played a significant role. On top of those, there are a few other major weaknesses of all models, which have been explained well already by others.
The first is the Sedin problem. This issue arises when two players play almost entirely together, models struggle to distribute credit between the 2, you can read Matt Cane's thread about it here. Then there is the problem where elite players can do weird things to their teammates result, explained by Ian Tulloch here. And finally there is one weakness in many public models which has yet to be addressed.
8A) The more important QoC
One of the biggest puzzles in hockey analytics today is adjusting for context, and specifically QoC, which generally refers to quality of competition. I have a small adjustment for that in the model, however I don't adjust for the more important QoC, quality of coach. This is something Dawson Sprigings noted back when he was in the public sphere on the hockey graphs podcast. He spoke about how in his WAR model, he found adjusting for quality of coach to be more important than quality of teammate.
Many people (including myself at first) are likely highly skeptical of this, however let me present a case study. When working with the first model I made (very similar to Gamescore) I noticed something really weird when creating the age curve. From the 2014-15 onto the 2015-16 season, a cluster of players all saw improvements in their results, and it caught my eye. All of these players had one thing in common, they all played for the Toronto Maple Leafs. Using Dom's Gamescore from Corsica, you can see the results for yourself.
Gamescore is a primarily an offensive metric, and the Leafs shooting percentage dropped from 7.51% to 6.31%, their goals for cratered, and yet every single core piece of the team saw their results improve. So what changed? They transitioned from some of the worst coaching in the NHL to among the best, and benefited from the Babcock effect. And to show this wasn't just a weird year, I included the two years before Babcock in blue, and his first two seasons behind the bench in orange. Over the 4 year sample, every single player had their best two seasons under Babcock, and their worst two before he took over.
This is probably one of the most extreme example's I could cite (Bruce Boudreau and Mike Sullivan appeared to have massive effects too), however it shows the huge effect a quality of coach can have. That being said, I don't have a method to account for this, and this is a topic not often discussed in the public sphere, so it will likely be a while before adjustments in this field are made again. So for now, in ability to adjust for QoC will remain a weakness.
9) Testing and Results
With the calculations out of the way, we can finally get to the fun stuff, the results! Throughout the summer I plan to be adding to the data set as far back as I can go, but for now we have the past 5 years of data for forwards, and 4 for defencemen. Let's start by looking at the best GAR seasons at each position in recent history.
In the end the sniff test is meaningless, but it is nice to see the model recognizing the best seasons to be from superstars. Peak Crosby is the king, and McDavid is going to truly be something special. For defencemen, Karlsson has lead the way in literally every season I have data for (and has somehow only won the Norris once in that time). Then Hedman Doughty, OEL, Burns and Subban also make appearances in the top 10. The full list of results are available on this google doc. It's only the past 2 seasons for now, but in the upcoming week or so I'll be adding more seasons, and different filters like age, draft position, salary and anything I find interesting for people to play around with too (Thanks to Rob Vollman's Data). And just to show that all of the metrics I've presented you with aren't just noise, here is a chart showing the year to year repeatabiltity of each metric.
If you've made it this far, thanks for reading! That makes goals above replacement, a quick snapshot of player talent boiled down into one number. It's not as complex as some others, but it does combine all of the numbers people are most likely to cite anyways when discussing why player X is better than player Y. If you have any questions comments or concerns your welcome to comment or reach-out to me on twitter @CMhockey66. I'm always open to discuss why numbers are the way they are, improvements going forward, or anything really. And finally, thanks to Manny and his website Corsica, which is where all of the data in this post came from!