Nerds like me tend to fetishize “market inefficiencies”. That hidden pattern showing the conventional wisdom is wrong. Suggesting there is an edge to be gained with some trick. In hockey, these market inefficiencies are likely most valuable at the NHL draft because of the NHL’s entry level and RFA contract systems allowing teams to underpay top talent if they drafted it for at least 7 seasons. This post will explore a relatively old topic that has been a market inefficiency at the NHL draft in past few seasons, drafting players with later birthdays (in a given year, so December 15th as opposed to January 15).
It is easy to see how players with later birthdays might become undervalued by scouts. Humans usually treat age like an integer (18, 19, 20 etc.) when in reality it is a continuous variable (you can be 17.69 years old). It is easy to see how this may lead players with later birthdays to be undervalued. Imagine two 17-year-old prospects, however one is born in mid January, while the other is born in mid December. At age 17, the January born prospect is about 5% older than the December born prospect. That is 5% more time to train with skills coaches, play, workout etc. (Obviously even more than 5% because babies are not working out). All else being equal this ~5% edge means the January born player should be better at hockey when both prospects are "17". What % of this edge is due to the older player being a better prospect, and what % is due to the advanced age of the January prospect can be difficult to discern. If scouts are not careful it can lead to the December born player being undervalued because the gap between him and his peer is a result of age, not being an inferior prospect.
This potential bias likely explains (at least in part) some peculiar patterns in recent NHL draft history. Using drafts from 2007-2013, drafted players with later birthdates have outperformed picks with relatively early birthdays in the 7 seasons following the players being drafted. If you plot drafted players who have made the NHL’s output over that time against their birthdate in the year (so 1 is born on January 1st) there is an obvious upward trend in output as birthdate increases.
(Note this post is using Standing Points Above Replacement (SPAR) data from Evolving Hockey to define “output”. I call the metric being used SPAR Index because it takes the average of each players Standing Points Above Replacement and Expected Standing Points Above Replacement)
The upward trend shows that, among draft picks who have made the NHL, those born later in the year have outperformed prospects born earlier in the year in the 7 seasons following their NHL drafts. An NHL player who was drafted in December has on averaged, outperformed a player drafted in mid December by about 2.6 standing points above replacement index in the 7 seasons following the draft, on average. This may sound like a small margin, but it is more than one win above replacement. Given the obscene amounts of money teams pay for single wins on July 1st, it is far from insignificant. This relationship is also not likely a result of draft position and therefor perceived prospect quality either, because day of year born is not related to draft position.
Age Bias By Position
The relationship between birthdate and output is not equal among all positional groups either. I expected defensive prospects to be most miss valued due to the relative age effect. Defence is generally considered to be the more physical position. So, i figured having less time to develop physically would disproportionately hurt defenders. When looking at the data, the opposite is true?
There is an upward trend where player output increases with birthdate among forwards, but not defencemen. (The upward trend exists for goalies too, but there is not enough data for it too be meaningful). So, it turns out only forwards with relatively late birthdays have been undervalued. The relationship only holds among the entire sample because over 55% of NHL draft picks were forwards from 2007-2013.
While we have only looked at cumulative output in the 7-year period following the players NHL draft’s so far, we can see the relationship’s from above have not been a result of changing opportunity either. This is clear because when we look at per game and per hour output, we still see the same pattern. Forwards with late birthdays outperform forwards with early birthdays on a per game and per hour basis, while defenders do not.
(The SPAR_Index_82_Regressed_100 variable represents SPAR_Index per 82 games played, while SPAR_Index_1000_Regressed_1500 represents SPAR Index per 1000 minutes of time on ice. To control for outliers who produce extremely well or poorly in small samples these numbers have been regressed towards replacement level. The per 82 game metrics was regressed towards replacement level up to players 100 game mark, while the per 1000-minute metric was regressed towards replacement level up to 1500 minutes. So, if a player produced 1 SPAR index in 14 games and 140 minutes, rather than a per 82 game output of 5.85, this analysis assumes the player was a replacement level producer for the remaining 86 games. This pulls their per 82 game output down from an inflated 5.85 to 0.82. The same process was done up to 1500 minutes for the per 1000 minutes data. The endpoints matter here and were selected by me, so it is worth noting everything I am about to mention applies when regressed to replacement level up to 50 games played and 750 minutes played too).
The next graph shows the same metrics, but no trend for defenders.
No matter what metric is used, per minute output, per game output, or cumulative output in the players 7 seasons following their draft, forwards born later in the year have been better than those born earlier in the year, and defenders have not. At least not by enough that the difference is meaningful. Even when running a regression to hold draft position constant, being born later in the year has a statistically significant increase in player output. (Using the idea from Dawson Sprigings Draft pick value article of a logarithmic trendline to account for the nonlinearity in pick value.
However, the same relationship does not exist for defenders.
If teams were drafting without biases over a decent sample, no meaningful correlation would exist between player output and any variable other variable when draft position is being held constant because players draft position should include all valuable information available at the time . And yet something as simple as birthate is highly correlated with output even after accounting for draft position. The effect is so dramatic that independent of draft position, a 330 day increase in birthdate within a year has corresponded to an additional 0.41 Standing Points above replacement index per 82 games, plus or minus 0.117 SPAR index. This relationship tells us teams have been systematically undervaluing the effect of forwards birthdates at the NHL draft. They have been too harsh on younger forwards and too high on older forwards.
Adjusting for relative age can be a very difficult thing to do. The most recent run of NHL drafts that have players we can analyze shows this. As a result, NHL teams may be able to draft better simply by adjusting for birthdates differently than they have in the past. By selecting players with late birthdates earlier than market has previously suggested they should, your favorite team may have an edge when the 2020 NHL draft rolls around.