NormanDale wrote:I know this post is from a long time ago, but can I ask why you're such a big fan? I really like the system in theory, but there are a few things about it that hurt its usefulness in my opinion:
1) Aggressive "regression to the mean" assumptions. It seems like every year, the best players are significantly better than where CARM-ELO predicts them to be. This also leads to a roughly linear downward projection for every top player over the age of 25 or so, and for flat projections for top players under 25. Neither seems realistic. Are LeBron, Harden, Westbrook, Curry, Durant, etc. all going to get progressively worse each year until they retire? Probably not. Will Giannis, Towns, Jokic, etc. all peak at where they are now (or lower), then remain steady? No. It seems to systematically undervalue star and up-and-coming players, mainly because some other similar players have gotten injured in the past.
2) Black Box-level opacity. This speaks to your lack of success replicating it. As a Celtics fan, I remain baffled by the All-Star level projections it has had for Marcus Smart year after year, for example. It seems like the variables they choose are perhaps not the most predictive ones.
3) Failure to differentiate regular season from playoffs. It seems like, if this system had existed in the 60s, the Celtics would have been considered underdogs (at least against the field) each year heading into the playoffs. If it had existed in the late 80s, it would have undervalued the Lakers' playoff chances. Same in the early 2000s. It seems like a model that incorporates previous post-season results as well as current-year regular season results to project the playoffs would make more sense.
Wondering what others think. I'm really not interested in "that thing is so stoopid, lulz" type takes, which is why I'm posting on this forum. Hope the thread doesn't get lost.
Not the same guy, but maybe I can discuss some of these topics.
1) There's quite a few things that contribute to this effect.
Low minutes - This mainly applies to the WAR projection, but you may have noticed that anyone who played stayed healthy is projected to play less minutes next season. The reason is because the injury "penalty" (for lack of a better term) gets distributed evenly throughout the league. Say you have 10 players who played 2500 minutes. What happens next season? Lets say that on average 7 play the same amount of minutes, 1 of them plays even more, and 2 play significantly due to injury/other reason. The system is going to project less minutes out of those 10 guys than last season, but who is it going to take from? In this case, everyone receives the penalty because the model can't predict this or that guy to take the lump sum of missed minutes.
So your point about it systematically undervaluing players because some get injured is right. The ones who stay healthy will be undervalued. On the other hand, the ones who do get injured are being overrate. It's giving an average of the healthy and injured, but that's not really how it works in the real world. This dilemma is worth discussing and comes up often in predicative models. I agree with you here.
Upside vs Downside - To sorta tie into that last point, when it comes to the superstars of this league it's much easier to fall than rise. For example, it's more likely that a player will go from 6.0 to 3.0 than 6.0 to 9.0. The opposite is true for the worst players in the league i.e -4.0 to -2.0 is more likely than -4.0 to -6.0.
Just like minutes, that penalty is awarded across the board. In reality, most will stay similar, while a some will fall. On aggregate, the model knows that there will be more fallen value than risen value. It just doesn't know who to assign it to, so it splits it across them all.
Baseline - Another thing you might have noticed is that players who have had a breakout season, even if they're young, are projected to be worse the next season. Giannis +/- was 5.3 last season, but 538 projects him 4.5 this year. Now why is 538 saying he is going to be worse? The real reason is that they are skeptical that he was that good last year. He went from 1.5 to 5.3 in one season, which is very atypical. The model gives him credit for that 5.3, but not full credit. Instead it gives him a baseline between 1.5 and 5.3. Let's just say it thinks he's at 4.0. From there it applies the aging curve and thinks he'll improve to 4.5 based on his age.
Reality - The progression/regression of players is based on historical comparisons, and those drop offs are more common than most would like to think. For example,
take a look at LeBron's most similar players. Those are the players who have been determined to be most similar to LeBron by the system. They're all dropping, so the model can't say "LeBron is different it doesn't matter". The model is predicting a dropoff because the players that it determined were most similar dropped off.
2. There isn't much black box to this model. Their +/- is a blend between RPM and BPM. I believe it is 2/3 RPM and 1/3 BPM this year. BPM is not a black box. RAPM is not a black box either, and while it combines with a black box prior, it's not hard to get an idea of where that's coming from. Marcus Smart is considered an all star because of how he rates out in BPM and RPM relative to his age and experience. That's really it. I will say, however, that their bar for "future all star" is wonky. It's really just for display reasons (the numbers are what really matter), but I do agree that they should have done a better job here.
3. Carmelo is a regular season projection system based on regular season data. I'm not sure adding playoff data would make a regular season projection more accurate. A lot of times things that make sense in our heads don't apply to the real world. For example, second half performance correlating with playoff/next season performance. I wonder if there has been any work on this.
With that said, I'm almost positive that RPM updates during the playoffs, and it's 2/3 of the +/- blend. So in that sense, it is incorporating the playoffs.