Player similarity CARMELO 538 system - can someone reproduce it?

Moderator: Doctor MJ

deckoff
Ballboy
Posts: 2
And1: 0
Joined: Mar 15, 2017
   

Player similarity CARMELO 538 system - can someone reproduce it? 

Post#1 » by deckoff » Tue Mar 28, 2017 7:48 pm

Hello everyone.
This is my first post.
I am a great fan of 538's CARMELO system for evaluating players and teams.
I want to reproduce the finding similar players part, but even though I follow the explanations given, the numbers I get are way off. Can someone help in any way, pointing me to a similar model ( better described, or with examples), or otherwise. I have all the data needed from basketball-reference readily downloaded.
Regards
User avatar
CptCrunch
Assistant Coach
Posts: 4,255
And1: 4,323
Joined: Jun 30, 2016
   

Re: Player similarity CARMELO 538 system - can someone reproduce it? 

Post#2 » by CptCrunch » Wed Mar 29, 2017 5:00 pm

The concept behind this seems pretty simple tbh. Without knowing the technical details, here are my best guesses or rather how I would implement this if someone gave me the time and $ to do so.

1. You first segment the players into clusters or groups based on their quantilized or scaled measurements and boxscores using some sort of unsupervised classification framework. This is probably used to deal with issues with residuals within each clusters for step 2.

2. Then you run some sort of WAR projection regression within each group using something from the regression framework. This gives you the WAR forecasts. You either link WAR forecasts to the other forecasted metrics via indexing or run separate regressions for all those metrics. I would just index with WAR.
deckoff
Ballboy
Posts: 2
And1: 0
Joined: Mar 15, 2017
   

Re: Player similarity CARMELO 538 system - can someone reproduce it? 

Post#3 » by deckoff » Thu Mar 30, 2017 6:43 am

Hey. Thanx for the answer. They actually go into much bigger detail explaining it, my problem is that I follow the steps, but dont get the same results. Is there a paid service that can actually do he maths behind for me? That would also be of help.
Thank you
NormanDale
Rookie
Posts: 1,200
And1: 629
Joined: Mar 30, 2005

Re: Player similarity CARMELO 538 system - can someone reproduce it? 

Post#4 » by NormanDale » Mon Jul 31, 2017 2:57 pm

deckoff wrote:Hello everyone.
This is my first post.
I am a great fan of 538's CARMELO system for evaluating players and teams.
I want to reproduce the finding similar players part, but even though I follow the explanations given, the numbers I get are way off. Can someone help in any way, pointing me to a similar model ( better described, or with examples), or otherwise. I have all the data needed from basketball-reference readily downloaded.
Regards


I know this post is from a long time ago, but can I ask why you're such a big fan? I really like the system in theory, but there are a few things about it that hurt its usefulness in my opinion:

1) Aggressive "regression to the mean" assumptions. It seems like every year, the best players are significantly better than where CARM-ELO predicts them to be. This also leads to a roughly linear downward projection for every top player over the age of 25 or so, and for flat projections for top players under 25. Neither seems realistic. Are LeBron, Harden, Westbrook, Curry, Durant, etc. all going to get progressively worse each year until they retire? Probably not. Will Giannis, Towns, Jokic, etc. all peak at where they are now (or lower), then remain steady? No. It seems to systematically undervalue star and up-and-coming players, mainly because some other similar players have gotten injured in the past.

2) Black Box-level opacity. This speaks to your lack of success replicating it. As a Celtics fan, I remain baffled by the All-Star level projections it has had for Marcus Smart year after year, for example. It seems like the variables they choose are perhaps not the most predictive ones.

3) Failure to differentiate regular season from playoffs. It seems like, if this system had existed in the 60s, the Celtics would have been considered underdogs (at least against the field) each year heading into the playoffs. If it had existed in the late 80s, it would have undervalued the Lakers' playoff chances. Same in the early 2000s. It seems like a model that incorporates previous post-season results as well as current-year regular season results to project the playoffs would make more sense.


Wondering what others think. I'm really not interested in "that thing is so stoopid, lulz" type takes, which is why I'm posting on this forum. Hope the thread doesn't get lost.
Hear you tell it, man I'm fallin/Well somebody must have caught him/Cause every fourth quarter/I like to Mike Jordan 'em.

"I think you'll find that these are the exact same dimensions as our gym back at Hickory."
KqWIN
RealGM
Posts: 15,520
And1: 6,360
Joined: May 15, 2014
 

Re: Player similarity CARMELO 538 system - can someone reproduce it? 

Post#5 » by KqWIN » Fri Aug 25, 2017 4:48 am

NormanDale wrote:I know this post is from a long time ago, but can I ask why you're such a big fan? I really like the system in theory, but there are a few things about it that hurt its usefulness in my opinion:

1) Aggressive "regression to the mean" assumptions. It seems like every year, the best players are significantly better than where CARM-ELO predicts them to be. This also leads to a roughly linear downward projection for every top player over the age of 25 or so, and for flat projections for top players under 25. Neither seems realistic. Are LeBron, Harden, Westbrook, Curry, Durant, etc. all going to get progressively worse each year until they retire? Probably not. Will Giannis, Towns, Jokic, etc. all peak at where they are now (or lower), then remain steady? No. It seems to systematically undervalue star and up-and-coming players, mainly because some other similar players have gotten injured in the past.

2) Black Box-level opacity. This speaks to your lack of success replicating it. As a Celtics fan, I remain baffled by the All-Star level projections it has had for Marcus Smart year after year, for example. It seems like the variables they choose are perhaps not the most predictive ones.

3) Failure to differentiate regular season from playoffs. It seems like, if this system had existed in the 60s, the Celtics would have been considered underdogs (at least against the field) each year heading into the playoffs. If it had existed in the late 80s, it would have undervalued the Lakers' playoff chances. Same in the early 2000s. It seems like a model that incorporates previous post-season results as well as current-year regular season results to project the playoffs would make more sense.


Wondering what others think. I'm really not interested in "that thing is so stoopid, lulz" type takes, which is why I'm posting on this forum. Hope the thread doesn't get lost.


Not the same guy, but maybe I can discuss some of these topics.

1) There's quite a few things that contribute to this effect.

Low minutes - This mainly applies to the WAR projection, but you may have noticed that anyone who played stayed healthy is projected to play less minutes next season. The reason is because the injury "penalty" (for lack of a better term) gets distributed evenly throughout the league. Say you have 10 players who played 2500 minutes. What happens next season? Lets say that on average 7 play the same amount of minutes, 1 of them plays even more, and 2 play significantly due to injury/other reason. The system is going to project less minutes out of those 10 guys than last season, but who is it going to take from? In this case, everyone receives the penalty because the model can't predict this or that guy to take the lump sum of missed minutes.

So your point about it systematically undervaluing players because some get injured is right. The ones who stay healthy will be undervalued. On the other hand, the ones who do get injured are being overrate. It's giving an average of the healthy and injured, but that's not really how it works in the real world. This dilemma is worth discussing and comes up often in predicative models. I agree with you here.

Upside vs Downside - To sorta tie into that last point, when it comes to the superstars of this league it's much easier to fall than rise. For example, it's more likely that a player will go from 6.0 to 3.0 than 6.0 to 9.0. The opposite is true for the worst players in the league i.e -4.0 to -2.0 is more likely than -4.0 to -6.0.

Just like minutes, that penalty is awarded across the board. In reality, most will stay similar, while a some will fall. On aggregate, the model knows that there will be more fallen value than risen value. It just doesn't know who to assign it to, so it splits it across them all.

Baseline - Another thing you might have noticed is that players who have had a breakout season, even if they're young, are projected to be worse the next season. Giannis +/- was 5.3 last season, but 538 projects him 4.5 this year. Now why is 538 saying he is going to be worse? The real reason is that they are skeptical that he was that good last year. He went from 1.5 to 5.3 in one season, which is very atypical. The model gives him credit for that 5.3, but not full credit. Instead it gives him a baseline between 1.5 and 5.3. Let's just say it thinks he's at 4.0. From there it applies the aging curve and thinks he'll improve to 4.5 based on his age.

Reality - The progression/regression of players is based on historical comparisons, and those drop offs are more common than most would like to think. For example, take a look at LeBron's most similar players. Those are the players who have been determined to be most similar to LeBron by the system. They're all dropping, so the model can't say "LeBron is different it doesn't matter". The model is predicting a dropoff because the players that it determined were most similar dropped off.

2. There isn't much black box to this model. Their +/- is a blend between RPM and BPM. I believe it is 2/3 RPM and 1/3 BPM this year. BPM is not a black box. RAPM is not a black box either, and while it combines with a black box prior, it's not hard to get an idea of where that's coming from. Marcus Smart is considered an all star because of how he rates out in BPM and RPM relative to his age and experience. That's really it. I will say, however, that their bar for "future all star" is wonky. It's really just for display reasons (the numbers are what really matter), but I do agree that they should have done a better job here.

3. Carmelo is a regular season projection system based on regular season data. I'm not sure adding playoff data would make a regular season projection more accurate. A lot of times things that make sense in our heads don't apply to the real world. For example, second half performance correlating with playoff/next season performance. I wonder if there has been any work on this.

With that said, I'm almost positive that RPM updates during the playoffs, and it's 2/3 of the +/- blend. So in that sense, it is incorporating the playoffs.
NormanDale
Rookie
Posts: 1,200
And1: 629
Joined: Mar 30, 2005

Re: Player similarity CARMELO 538 system - can someone reproduce it? 

Post#6 » by NormanDale » Sat Aug 26, 2017 5:58 pm

Thank you for that. I have a few follow up questions, but I'll ask later when I'm using my computer and not my phone.
Hear you tell it, man I'm fallin/Well somebody must have caught him/Cause every fourth quarter/I like to Mike Jordan 'em.



"I think you'll find that these are the exact same dimensions as our gym back at Hickory."

Return to Statistical Analysis