Penalized Regression of WOWY data

Post #21 » by **Moonbeam** » Sun Jul 30, 2023 10:20 am

OhayoKD wrote:Looking at these results...
Moonbeam wrote:
OhayoKD wrote:I’m happy to take any feedback you may have or ideas for modifications I haven’t considered.

would it be greedy of me to ask for a graph charting magic, bird, hakeem, and jordan specifically?

No problem! Happy to add more graphs if you (or anyone else) would like.

[/quote]
Possible takeaways(not sure how much I should weigh this, but it seems like a promising approach)
-> Magic potentially the true "impact king"
-> MJ's era-relative impact peak might actually be during the 2nd-three-peat(expansion goes brr)
-> Delta between 80's Hakeem and "peak" Hakeem overplayed?

I imagine there is still some box-bias in these results but I'm guessing it's made up for by more stable adjustments...

Moonbeam wrote:
lessthanjake wrote:Interesting stuff, and I think is very similar to Thinking Basketball’s WOWYR.

It feels to me like there’s really just not a lot of data for a lot of this though, particularly in these past eras where players missed very few games. For instance, Doc mentioned being surprised about Karl Malone being below Charles Barkley, so I’ll use Karl Malone as an example. Karl Malone is well below Barkley in the 1989-1993 timeframe. But what is the data for Karl Malone’s 1989-1993 timeline based on? Well, here’s the list of number of missed games by players who played over 18 minutes a game in a season for the Jazz in that time period (in years they actually averaged 18+ minutes per game):

Karl Malone: 3
John Stockton: 4
Mark Eaton: 3
Thurl Bailey: 0
Darrell Griffith: 0
Bob Hansen: 37
Blue Edwards: 22
Jeff Malone: 17
Tyrone Corbin: 13
Mike Brown: 0
Jay Humphries: 4
David Benoit: 0
Larry Krystkowiak: 11

So, at least as I understand it (and I’ve admittedly not read through the actual paper, so sorry if I’m misinterpreting anything!), the model is basically trying to figure out Karl Malone’s impact essentially based on regressing what occurred in those missed games. There’s some missed games there, but nothing particularly substantial for anyone and the most substantial number of missed games are from relatively minor players. I don’t really see how a WOWY-based model can in any way accurately assess Karl Malone’s impact without almost any missed games from Malone, Stockton, or Eaton, and with several other relevant players having 0 missed games at all. It seems like the results would inevitably just be based on statistical noise centered largely around what randomly happened to occur when pretty inconsequential players like Bob Hansen and Blue Edwards were out.

Another example of this is the 1989-1993 timeframe for Jordan. He does fairly well in this timeframe, but what is the data based on? Here’s the total missed games of people on the Bulls who played 18 MPG in a given season in that timeframe:

Michael Jordan: 7
Scottie Pippen: 10
Horace Grant: 14
BJ Armstrong: 0
Bill Cartwright: 55
Scott Williams: 11
John Paxson: 7
Stacey King: 0
Craig Hodges: 33
Sam Vincent: 12
Brad Sellers: 2
Dave Corzine: 1

There’s basically virtually zero missed-game data there, except for what happened in a bunch of missed games from Bill Cartwright and Craig Hodges. Players like that don’t *really* affect games that much, but when they make up a huge portion of the teams’ missed games, what randomly happens to occur in missed games by players like that can really skew a model like this. For instance, we see above that Craig Hodges missed 33 games in years he played 18+ MPG. This was all in the 1989 season. And, based on the charts provided, we actually see Jordan’s rating in this measure tank from the 1984-1988 time period to the 1985-1989 time period and he didn’t get super high until 1989 was out of the time period, so it seems reasonably obvious that something happened in 1989 that tanked his rating. The only person that missed a lot of games that season was Craig Hodges. The Bulls happened to go 32-17 in the games Craig Hodges played and 15-18 in the games Hodges Missed (and I’m sure the difference in average margin of victory is pretty significant too). So my guess is that the model thinks Craig Hodges was really impactful (and his missed games make up a significant portion of the entire set of missed games that’s being regressed), so what happened in those games has a significant impact on Jordan’s perceived impact in time periods that contain that year (and note that Pippen dropped that same year too—though a bit less, probably because he missed several games that Hodges missed too).

Thanks for the comment and getting into the detail. You're right in that the utility of these metrics are limited due to the nature of the data we have available --- I can't and wouldn't shy away from that. I wouldn't feel like I was being responsible if I just pushed out these numbers without some important caveats like that as well as others in the document I shared. I do think there's some value in this, though.

Speaking of your specific examples, there is a bit more to it than that. These models are still making use of data with all of the players healthy to form a sort of baseline, so it's not like the "With" data doesn't matter --- it still does. Moreover, when players leave a team, they would be considered "missing" for those seasons. In your Bulls example, for instance, Craig Hodges would be listed as missing for the entirety of the 1992-93 season with respect to being Jordan's teammate. Stacey King would be considered missing for all but the 1989-90 season due to the minutes threshold. Sam Vincent would be considered missing for all but the 1988-89 season. And so on and so forth. This extra missingness allows for transactions between seasons to inform the estimates a bit more than merely looking at missed games. I'll note that for these players, their time away from Chicago would inform their baseline impact. Sam Vincent's impact with Orlando for the 1989-90, 1990-91, and 1991-92 seasons will inform his baseline impact and therefore inform his contribution to the 1988-89 Bulls' scoring margins.

Hopefully this helps clear it up a bit.

I'm happy for you or anyone else to ask more questions or offer more critiques. They may help me improve these metrics going forward!

Correct me if I'm wrong, but aren't you also using the internal box-scaling of teammates to "stabalize" the samples similar to the way a LEBRON or EPM or RPM would?

Honestly, all considered I wouldn't be shocked if this graded out as "industry standard" for pre-data ball RAPM approximation. Someone should definitely share this with Ben.[/quote]

Nope, this is completely devoid of any individual box score statistics. I understand why some hybrid metrics have been put together, and these results could possibly be extended in a similar way, but I thought it would be interesting to look at “pure” WOWY data here. Yes, we’ll get the odd Ed Nealys as I mentioned in the document, but I think those occurrences are interesting in and of themselves, and can possibly be explained in the context of how these models are built. Others, like Paul Pressey, may very well emerge as bonafide impact powerhouses that would be otherwise ignored (or have their signals diluted) by combining with box score stuff. Ultimately, I think it’s important to have box score data and impact data separately to inform our evaluations. I’m not opposed to hybrid measures at all, but I think they should only be developed as a bridge between the two, and communicated as such.

I’m not familiar with all of the details of the different implementation of RAPM and other metrics, as much of that detail is kept under the hood from what I’ve seen, but I’ve felt pretty strongly here about making the code and the detail as transparent as possible. I’ve felt somewhat frustrated at times at a lack of transparency of other methods. Taking RAPM as an example, I think it is usually estimated with Bayesian methods, which require what are known as prior distributions to produce their estimates, but those details are usually kept under wraps from what I’ve seen. Maybe I just haven’t looked in the right places for those details. The thorough breakdown by Squared2020 of RAPM was such a breath of fresh air for that reason.

OhayoKD · Post #22 » by **OhayoKD** » Sun Jul 30, 2023 10:37 am

Moonbeam wrote:
OhayoKD wrote:Nope, this is completely devoid of any individual box score statistics. I understand why some hybrid metrics have been put together, and these results could possibly be extended in a similar way, but I thought it would be interesting to look at “pure” WOWY data here. Yes, we’ll get the odd Ed Nealys as I mentioned in the document, but I think those occurrences are interesting in and of themselves, and can possibly be explained in the context of how these models are built. Others, like Paul Pressey, may very well emerge as bonafide impact powerhouses that would be otherwise ignored (or have their signals diluted) by combining with box score stuff. Ultimately, I think it’s important to have box score data and impact data separately to inform our evaluations. I’m not opposed to hybrid measures at all, but I think they should only be developed as a bridge between the two, and communicated as such.

Ah. Misinterpreted, my bad. Am I correct to take this as "curving down outliers"?

However, the coefficients for the three methods which make use of shrinkage techniques are, by their nature,
biased estimates of the impact on scoring margin as they shrink the coefficients toward 0 in order to reduce
the variation in the estimation process.

[/quote]

Also, FWIW, Cheema is actually pretty transparent about their rapm set which I believe is intended to be "pure"
https://www.thespax.com/nba/quantifying-the-nbas-greatest-five-year-peaks-since-1997/

Post #23 » by **Moonbeam** » Sun Jul 30, 2023 11:00 am

OhayoKD wrote:
Moonbeam wrote:
OhayoKD wrote:Nope, this is completely devoid of any individual box score statistics. I understand why some hybrid metrics have been put together, and these results could possibly be extended in a similar way, but I thought it would be interesting to look at “pure” WOWY data here. Yes, we’ll get the odd Ed Nealys as I mentioned in the document, but I think those occurrences are interesting in and of themselves, and can possibly be explained in the context of how these models are built. Others, like Paul Pressey, may very well emerge as bonafide impact powerhouses that would be otherwise ignored (or have their signals diluted) by combining with box score stuff. Ultimately, I think it’s important to have box score data and impact data separately to inform our evaluations. I’m not opposed to hybrid measures at all, but I think they should only be developed as a bridge between the two, and communicated as such.

Ah. Misinterpreted, my bad. Am I correct to take this as "curving down outliers"?
However, the coefficients for the three methods which make use of shrinkage techniques are, by their nature,
biased estimates of the impact on scoring margin as they shrink the coefficients toward 0 in order to reduce
the variation in the estimation process.

Also, FWIW, Cheema is actually pretty transparent about their rapm set which I believe is intended to be "pure"
https://www.thespax.com/nba/quantifying-the-nbas-greatest-five-year-peaks-since-1997/[/quote]

Regarding shrinkage, it's not necessarily about curving down outliers, but curving down everything as a way to reduce the variance. The coefficients for all players will be closer to zero. As such, the relative distance between outliers and the rest of the pack may still be the same. This is *especially* the case with adaptive lasso and adaptive elastic net, where the penalty is harsher on the players in the middle of the pack to prioritize estimation of the outliers.

You can see this in the regularization paths in the document. This is the one for the Ridge variant:

If you look at the outliers (say the blue one on the top and the black one on the bottom), they are still standing out as outliers as we move from the left to the right until we get to a log lambda of about 2. I believe the model chose a log lambda around 1.3 or so, so those points still stand out as outliers.

Here's the adaptive lasso path:

As the log lambda gets bigger, the outliers are even more pronounced than the outliers with no penalty toward the left of the graph. The coefficients that emerge as outliers do change as we go from left to right, which is interesting, but this suggests that some of those original outliers may have had especially high variance, meaning there was more uncertainty in estimating them than other coefficients.

Also, thank you for that link from Cheema! I hadn't seen that and it's fantastic to see discussion of the priors there and the reasoning behind them.

The stuff I've put together here would be analogous to the NPI (non-prior-informed) RAPM. Cheema discusses using box score statistics to inform the prior distributions as they tend to improve the estimation for RAPM. Perhaps a similar thing could be achieved with WOWY data, but I haven't done that yet.

WestGOAT · Post #24 » by **WestGOAT** » Sun Jul 30, 2023 11:17 am

Could be a fun exercise to look at offensive- and defensive-WOWY. So instead of setting the points margin as y, you can set points scored or against (relative to the average for the specific time-window) as Y. Probably will be noisey, but would be interesting to see if the usual suspects that are considered defensive juggernauts would pop up, and especially if bigs being more valuable than smalls is being reproduced.

Post #25 » by **Doctor MJ** » Sun Jul 30, 2023 4:54 pm

Moonbeam wrote:Yeah, Magic and Bird look outstanding here, particularly Magic. With some older box score data available now, there are a couple others who also jump out:

You're also right that these aren't per-possession metrics, but per-game, so Bobby Jones standing out looks quite impressive.

I think this is important data for us to see - while being cautious about taking noisy data too far.

I completely get being skeptical of Magic & Bird relative to other superstars. Easy to have an interpretation of "Transcendent offensive players sure, but limited on defense, can they really be more impactful than 2-way stars? Maybe we've just got a narrative full of winning bias.", but I think there's good reason to think that these guys were every bit the extreme impact outliers they're made out to be.

Cool seeing the '60s stars there too, and yeah, Russell fits along with Magic & Bird as guys who seem like they should be being held back by half the game but the impact data may tell us otherwise.

OhayoKD · Post #26 » by **OhayoKD** » Sun Jul 30, 2023 5:02 pm

Doctor MJ wrote:
Moonbeam wrote:Yeah, Magic and Bird look outstanding here, particularly Magic. With some older box score data available now, there are a couple others who also jump out:

You're also right that these aren't per-possession metrics, but per-game, so Bobby Jones standing out looks quite impressive.

I think this is important data for us to see - while being cautious about taking noisy data too far.

I completely get being skeptical of Magic & Bird relative to other superstars. Easy to have an interpretation of "Transcendent offensive players sure, but limited on defense, can they really be more impactful than 2-way stars? Maybe we've just got a narrative full of winning bias.", but I think there's good reason to think that these guys were every bit the extreme impact outliers they're made out to be.

Cool seeing the '60s stars there too, and yeah, Russell fits along with Magic & Bird as guys who seem like they should be being held back by half the game but the impact data may tell us otherwise.

Uh:

I don't see how anyone but Magic looks like an impact outlier here. Magic's career-splits are also outright matched by Duncan who looks better than Jordan where this graph would suggest his value peaked in various metrics like AUPM, WOWY, RAPM(Cheema, JE) despite playing way more minutes than any of his teammates(and as a result having to play with sub-standard teammates).

And then we have data ball where it is two two-way bigs and a guy who combines goat-tier offense with the ability to carry -5.5 defenses

I think what impact data bears out is that scoring and one-way offense is overrated if anything.

lessthanjake · Post #27 » by **lessthanjake** » Sun Jul 30, 2023 5:07 pm

Moonbeam wrote:
lessthanjake wrote:Interesting stuff, and I think is very similar to Thinking Basketball’s WOWYR.

It feels to me like there’s really just not a lot of data for a lot of this though, particularly in these past eras where players missed very few games. For instance, Doc mentioned being surprised about Karl Malone being below Charles Barkley, so I’ll use Karl Malone as an example. Karl Malone is well below Barkley in the 1989-1993 timeframe. But what is the data for Karl Malone’s 1989-1993 timeline based on? Well, here’s the list of number of missed games by players who played over 18 minutes a game in a season for the Jazz in that time period (in years they actually averaged 18+ minutes per game):

Karl Malone: 3
John Stockton: 4
Mark Eaton: 3
Thurl Bailey: 0
Darrell Griffith: 0
Bob Hansen: 37
Blue Edwards: 22
Jeff Malone: 17
Tyrone Corbin: 13
Mike Brown: 0
Jay Humphries: 4
David Benoit: 0
Larry Krystkowiak: 11

So, at least as I understand it (and I’ve admittedly not read through the actual paper, so sorry if I’m misinterpreting anything!), the model is basically trying to figure out Karl Malone’s impact essentially based on regressing what occurred in those missed games. There’s some missed games there, but nothing particularly substantial for anyone and the most substantial number of missed games are from relatively minor players. I don’t really see how a WOWY-based model can in any way accurately assess Karl Malone’s impact without almost any missed games from Malone, Stockton, or Eaton, and with several other relevant players having 0 missed games at all. It seems like the results would inevitably just be based on statistical noise centered largely around what randomly happened to occur when pretty inconsequential players like Bob Hansen and Blue Edwards were out.

Another example of this is the 1989-1993 timeframe for Jordan. He does fairly well in this timeframe, but what is the data based on? Here’s the total missed games of people on the Bulls who played 18 MPG in a given season in that timeframe:

Michael Jordan: 7
Scottie Pippen: 10
Horace Grant: 14
BJ Armstrong: 0
Bill Cartwright: 55
Scott Williams: 11
John Paxson: 7
Stacey King: 0
Craig Hodges: 33
Sam Vincent: 12
Brad Sellers: 2
Dave Corzine: 1

There’s basically virtually zero missed-game data there, except for what happened in a bunch of missed games from Bill Cartwright and Craig Hodges. Players like that don’t *really* affect games that much, but when they make up a huge portion of the teams’ missed games, what randomly happens to occur in missed games by players like that can really skew a model like this. For instance, we see above that Craig Hodges missed 33 games in years he played 18+ MPG. This was all in the 1989 season. And, based on the charts provided, we actually see Jordan’s rating in this measure tank from the 1984-1988 time period to the 1985-1989 time period and he didn’t get super high until 1989 was out of the time period, so it seems reasonably obvious that something happened in 1989 that tanked his rating. The only person that missed a lot of games that season was Craig Hodges. The Bulls happened to go 32-17 in the games Craig Hodges played and 15-18 in the games Hodges Missed (and I’m sure the difference in average margin of victory is pretty significant too). So my guess is that the model thinks Craig Hodges was really impactful (and his missed games make up a significant portion of the entire set of missed games that’s being regressed), so what happened in those games has a significant impact on Jordan’s perceived impact in time periods that contain that year (and note that Pippen dropped that same year too—though a bit less, probably because he missed several games that Hodges missed too).

Thanks for the comment and getting into the detail. You're right in that the utility of these metrics are limited due to the nature of the data we have available --- I can't and wouldn't shy away from that. I wouldn't feel like I was being responsible if I just pushed out these numbers without some important caveats like that as well as others in the document I shared. I do think there's some value in this, though.

Speaking of your specific examples, there is a bit more to it than that. These models are still making use of data with all of the players healthy to form a sort of baseline, so it's not like the "With" data doesn't matter --- it still does. Moreover, when players leave a team, they would be considered "missing" for those seasons. In your Bulls example, for instance, Craig Hodges would be listed as missing for the entirety of the 1992-93 season with respect to being Jordan's teammate. Stacey King would be considered missing for all but the 1989-90 season due to the minutes threshold. Sam Vincent would be considered missing for all but the 1988-89 season. And so on and so forth. This extra missingness allows for transactions between seasons to inform the estimates a bit more than merely looking at missed games. I'll note that for these players, their time away from Chicago would inform their baseline impact. Sam Vincent's impact with Orlando for the 1989-90, 1990-91, and 1991-92 seasons will inform his baseline impact and therefore inform his contribution to the 1988-89 Bulls' scoring margins.

Hopefully this helps clear it up a bit.

I'm happy for you or anyone else to ask more questions or offer more critiques. They may help me improve these metrics going forward!

Thanks for the clarification! Particularly the part about players being considered “missing” when they went to a different team or didn’t meet the minutes threshold. That’s relevant info about the model, of course. And it should provide the model with a bit more data on some players that changed teams. Of course, considering a player as there when they play 18 minutes and missing if they then played 17 minutes a game another season might lead to some odd/noisy outcomes in certain scenarios (i.e. the model might think a guy’s absence made a difference between two seasons when they actually were there both seasons and only played a minute less per game in one of them), but there’s no perfect way to model things.

The bottom line, though, is that I do think this is an inherently very noisy exercise. For instance, to take that Karl Malone example, I imagine the model is able to get a bit more information than I mentioned, because some of those role players missed games on other teams in the timeframe and had years below the minute threshold and whatnot. So that can potentially help improve what the model knows about some role players’ value. But when Karl Malone and John Stockton barely missed any games, I’m not sure how the model can form an accurate estimate of how the impact is parsed between those two guys (even if we assumed for arguments’ purposes that the model *could* accurately estimate the impact of everyone else on the team).

Anyways, I want to be clear that this is not a criticism of you at all. This is incredible work! I’m more just pointing out the inherent limitations in this sort of endeavor (i.e. limited data, which there’s nothing you can do about), so that people don’t take this as being more precise than it really is (or than I think you regard it to be).

Post #28 » by **Doctor MJ** » Sun Jul 30, 2023 5:14 pm

OhayoKD wrote:
Doctor MJ wrote:
Moonbeam wrote:Yeah, Magic and Bird look outstanding here, particularly Magic. With some older box score data available now, there are a couple others who also jump out:

You're also right that these aren't per-possession metrics, but per-game, so Bobby Jones standing out looks quite impressive.

I think this is important data for us to see - while being cautious about taking noisy data too far.

I completely get being skeptical of Magic & Bird relative to other superstars. Easy to have an interpretation of "Transcendent offensive players sure, but limited on defense, can they really be more impactful than 2-way stars? Maybe we've just got a narrative full of winning bias.", but I think there's good reason to think that these guys were every bit the extreme impact outliers they're made out to be.

Cool seeing the '60s stars there too, and yeah, Russell fits along with Magic & Bird as guys who seem like they should be being held back by half the game but the impact data may tell us otherwise.

Uh:

I don't see how anyone but Magic looks like an impact outlier here. Magic's career-splits are also outright matched by Duncan who looks better than Jordan where this graph would suggest his value peaked in various metrics like AUPM, WOWY, RAPM(Cheema, JE) despite playing way more minutes than any of his teammates(and as a result having to play with sub-standard teammates).

And then we have data ball where it is two two-way bigs and a guy who combines goat-tier offense with the ability to carry -5.5 defenses

I think what impact data bears out is that scoring and one-way offense is overrated if anything.

Really not sure how you can say "Magic looks like an impact outlier" and "data bears out one-way offense is overrated". We can have conversations about individuals certainly, but the idea that, say - among contemporaries, Olajuwon should rank ahead of Magic, is not helped by this data.

And while Bird doesn't look as strong as Magic, he looks pretty damn good too.

I think it would be cool to see more graphs along these lines for guys in more recent eras. That can obviously include the more fine-detailed stuff we get with legit +/- data, but apples-to-apples analyses always provide their own insight. I'd like to see how the Nashes and Kidds compare with the Stocktons and Prices, for example.

OhayoKD · Post #29 » by **OhayoKD** » Sun Jul 30, 2023 5:30 pm

Doctor MJ wrote:
OhayoKD wrote:
Doctor MJ wrote:Uh:

I don't see how anyone but Magic looks like an impact outlier here. Magic's career-splits are also outright matched by Duncan who looks better than Jordan where this graph would suggest his value peaked in various metrics like AUPM, WOWY, RAPM(Cheema, JE) despite playing way more minutes than any of his teammates(and as a result having to play with sub-standard teammates).

And then we have data ball where it is two two-way bigs and a guy who combines goat-tier offense with the ability to carry -5.5 defenses

I think what impact data bears out is that scoring and one-way offense is overrated if anything.

Really not sure how you can say "Magic looks like an impact outlier" and "data bears out one-way offense is overrated". We can have conversations about individuals certainly, but the idea that, say - among contemporaries, Olajuwon should rank ahead of Magic, is not helped by this data.

I do not know if anyone who voted Olajuwon above Magic has Hakeem as a stronger regular season player during their primes. The case was
-> Hakeem's teams see the biggest playoff elevation of anyone from the era team-wide, box-wide, "expected championshp differential", "srs upsets', ect ect
-> Hakeem by box and tape profiles similarly to Tim Duncan who arguably has a better impact portfolio than anyone from the 80's/90's
-> Hakeem was in a **** situation
-> Hakeem looks great in most concentrated samples(Me and Ben apply something around a 10gm/szn filter)
-> Hakeem by larger samples looks mj-ish and has better longetvity in the rs

Combine all that and that's where the Hakeem #5 and #6 votes were coming from. I'm pretty sure also benefitted from coming across as a more consistent version as a (some-times) impact outlier in Wilt.
[/quote]

I think it would be cool to see more graphs along these lines for guys in more recent eras. That can obviously include the more fine-detailed stuff we get with legit +/- data, but apples-to-apples analyses always provide their own insight. I'd like to see how the Nashes and Kidds compare with the Stocktons and Prices, for example.

I mean, "apples to apples" have played a big role in the shift towards "two-way rules". That's where the idea of Russell, Lebron, and Kareem as a tier onto themselves, and Duncan, Hakeem, and KG's elevation largely come from.

Post #30 » by **Doctor MJ** » Sun Jul 30, 2023 5:43 pm

OhayoKD wrote:
Doctor MJ wrote:Really not sure how you can say "Magic looks like an impact outlier" and "data bears out one-way offense is overrated". We can have conversations about individuals certainly, but the idea that, say - among contemporaries, Olajuwon should rank ahead of Magic, is not helped by this data.

I do not know if anyone who voted Olajuwon above Magic has Hakeem as a stronger regular season player during their primes. The case was
-> Hakeem's teams see the biggest playoff elevation of anyone from the era team-wide, box-wide, "expected championshp differential", "srs upsets', ect ect
-> Hakeem by box and tape profiles similarly to Tim Duncan who arguably has a better impact portfolio than anyone from the 80's/90's
-> Hakeem was in a **** situation
-> Hakeem looks great in most concentrated samples(Me and Ben apply something around a 10gm/szn filter)
-> Hakeem by larger samples looks mj-ish and has better longetvity in the rs

Combine all that and that's where the Hakeem #5 and #6 votes were coming from. I'm pretty sure also benefitted from coming across as a more consistent version as a (some-times) impact outlier in Wilt.

I think it would be cool to see more graphs along these lines for guys in more recent eras. That can obviously include the more fine-detailed stuff we get with legit +/- data, but apples-to-apples analyses always provide their own insight. I'd like to see how the Nashes and Kidds compare with the Stocktons and Prices, for example.

I mean, "apples to apples" have played a big role in the shift towards "two-way rules". That's where the idea of Russell, Lebron, and Kareem as a tier onto the
mselves, and Duncan, Hakeem, and KG's elevation largely come from.

Fair point about Hakeem's candidacy being so linked to the playoffs, but of course Magic has a similar edge compared to any of the other bigs of his era.

Re: apples-to-apples played a big role already. We're not talking about something that's implemented once and that's it. Moonbeam is giving us a new lens. We have not yet seen it across all eras, hence, our capacity for apples-to-apples with is still not where it can become.

Re: shifted toward two-way rules, where the idea of Russell. Here you just have me confused. Russell isn't the story of two-way rules. "Two-way rules" thinking has always been used against Russell with respect to Wilt and others.

OhayoKD · Post #31 » by **OhayoKD** » Sun Jul 30, 2023 6:09 pm

Doctor MJ wrote:
OhayoKD wrote:
Doctor MJ wrote:Really not sure how you can say "Magic looks like an impact outlier" and "data bears out one-way offense is overrated". We can have conversations about individuals certainly, but the idea that, say - among contemporaries, Olajuwon should rank ahead of Magic, is not helped by this data.

I do not know if anyone who voted Olajuwon above Magic has Hakeem as a stronger regular season player during their primes. The case was
-> Hakeem's teams see the biggest playoff elevation of anyone from the era team-wide, box-wide, "expected championshp differential", "srs upsets', ect ect
-> Hakeem by box and tape profiles similarly to Tim Duncan who arguably has a better impact portfolio than anyone from the 80's/90's
-> Hakeem was in a **** situation
-> Hakeem looks great in most concentrated samples(Me and Ben apply something around a 10gm/szn filter)
-> Hakeem by larger samples looks mj-ish and has better longetvity in the rs

Combine all that and that's where the Hakeem #5 and #6 votes were coming from. I'm pretty sure also benefitted from coming across as a more consistent version as a (some-times) impact outlier in Wilt.

I think it would be cool to see more graphs along these lines for guys in more recent eras. That can obviously include the more fine-detailed stuff we get with legit +/- data, but apples-to-apples analyses always provide their own insight. I'd like to see how the Nashes and Kidds compare with the Stocktons and Prices, for example.

I mean, "apples to apples" have played a big role in the shift towards "two-way rules". That's where the idea of Russell, Lebron, and Kareem as a tier onto the
mselves, and Duncan, Hakeem, and KG's elevation largely come from.

Fair point about Hakeem's candidacy being so linked to the playoffs, but of course Magic has a similar edge compared to any of the other bigs of his era.

Re: apples-to-apples played a big role already. We're not talking about something that's implemented once and that's it. Moonbeam is giving us a new lens. We have not yet seen it across all eras, hence, our capacity for apples-to-apples with is still not where it can become.

Sure. I'm just saying right now "magic johnson the impact king" is not really "anything new". The idea is that Magic johnson is a weaker impact king than the aforementioned big-three and is matched by a bunch of other two-way forces. Would be remiss to not mention that David Robinson(who looks by other stuff like he would be Magic's closest rival here) isn't included. Obviously the playoffs hurt that.

Re: shifted toward two-way rules, where the idea of Russell. Here you just have me confused. Russell isn't the story of two-way rules. "Two-way rules" thinking has always been used against Russell with respect to Wilt and others.

Well more specifically it is a story of "defense rules" which is a direct opposite of "offense rules". If you were to take it at face-value the story would go something like "huh defense>offense", well then, "defense+offense>>offense!". Of course that isn't really how one should interpret it, but it still is a fair warning for people who make assumptions about the cieling of the split. As is Dikembe, Walton, Thurmond, and Draymond can be argued to echo that warning. Even an AD or Giannis may see a more lopsided distribution towards defense depending on what you look at.

lessthanjake · Post #32 » by **lessthanjake** » Sun Jul 30, 2023 10:16 pm

Moonbeam wrote:.

Quick question on this: Is there a specific reason you’ve used the Ridge variant for the charts? Was that just a random choice? Or do you think that that’s the best version? And, if so, why? Or perhaps you’re just using that one since, at least as I understand it, RAPM is typically done using that method?

Post #33 » by **Doctor MJ** » Sun Jul 30, 2023 10:28 pm

OhayoKD wrote:
Doctor MJ wrote:
OhayoKD wrote:
I do not know if anyone who voted Olajuwon above Magic has Hakeem as a stronger regular season player during their primes. The case was
-> Hakeem's teams see the biggest playoff elevation of anyone from the era team-wide, box-wide, "expected championshp differential", "srs upsets', ect ect
-> Hakeem by box and tape profiles similarly to Tim Duncan who arguably has a better impact portfolio than anyone from the 80's/90's
-> Hakeem was in a **** situation
-> Hakeem looks great in most concentrated samples(Me and Ben apply something around a 10gm/szn filter)
-> Hakeem by larger samples looks mj-ish and has better longetvity in the rs

Combine all that and that's where the Hakeem #5 and #6 votes were coming from. I'm pretty sure also benefitted from coming across as a more consistent version as a (some-times) impact outlier in Wilt.

I mean, "apples to apples" have played a big role in the shift towards "two-way rules". That's where the idea of Russell, Lebron, and Kareem as a tier onto the
mselves, and Duncan, Hakeem, and KG's elevation largely come from.

Fair point about Hakeem's candidacy being so linked to the playoffs, but of course Magic has a similar edge compared to any of the other bigs of his era.

Re: apples-to-apples played a big role already. We're not talking about something that's implemented once and that's it. Moonbeam is giving us a new lens. We have not yet seen it across all eras, hence, our capacity for apples-to-apples with is still not where it can become.

Sure. I'm just saying right now "magic johnson the impact king" is not really "anything new". The idea is that Magic johnson is a weaker impact king than the aforementioned big-three and is matched by a bunch of other two-way forces. Would be remiss to not mention that David Robinson(who looks by other stuff like he would be Magic's closest rival here) isn't included. Obviously the playoffs hurt that.
Re: shifted toward two-way rules, where the idea of Russell. Here you just have me confused. Russell isn't the story of two-way rules. "Two-way rules" thinking has always been used against Russell with respect to Wilt and others.

Well more specifically it is a story of "defense rules" which is a direct opposite of "offense rules". If you were to take it at face-value the story would go something like "huh defense>offense", well then, "defense+offense>>offense!". Of course that isn't really how one should interpret it, but it still is a fair warning for people who make assumptions about the cieling of the split. As is Dikembe, Walton, Thurmond, and Draymond can be argued to echo that warning. Even an AD or Giannis may see a more lopsided distribution towards defense depending on what you look at.

I feel like we're talking past each other.

We just got access to new data courtesy Moonbeam.
Magic looks fantastic by this new data.
So this is something new.

It's also certainly not anything that's a part of a "defense rules" realization, so with respect to that, it's new.

Re: Robinson isn't included. Robinson's included on another graph. The direct overlap is not a lot, but he's clearly less of a percentile standout than Magic from what I see here.

Post #34 » by **Moonbeam** » Mon Jul 31, 2023 12:31 am

Doctor MJ wrote:
Moonbeam wrote:Yeah, Magic and Bird look outstanding here, particularly Magic. With some older box score data available now, there are a couple others who also jump out:

You're also right that these aren't per-possession metrics, but per-game, so Bobby Jones standing out looks quite impressive.

I think this is important data for us to see - while being cautious about taking noisy data too far.

I completely get being skeptical of Magic & Bird relative to other superstars. Easy to have an interpretation of "Transcendent offensive players sure, but limited on defense, can they really be more impactful than 2-way stars? Maybe we've just got a narrative full of winning bias.", but I think there's good reason to think that these guys were every bit the extreme impact outliers they're made out to be.

Cool seeing the '60s stars there too, and yeah, Russell fits along with Magic & Bird as guys who seem like they should be being held back by half the game but the impact data may tell us otherwise.

I fully agree that we need to interpret these results with some caution given the limitations of the data. And you are right that there could be some winning bias at play here, as the models are soaking up lots of good margins for players on good teams (and lots of bad ones for players on bad teams). But Magic and Bird stand out relative to their Laker and Celtic teammates as well. So does Russell in comparison to Cousy, Sharman, Jones, Heinsohn, etc.:

Ultimately, I think these metrics do have some value, but I absolutely would encourage everyone to consider them with appropriate caution in light of the limitations of the data.

Post #35 » by **Moonbeam** » Mon Jul 31, 2023 12:41 am

lessthanjake wrote:
Moonbeam wrote:
lessthanjake wrote:Interesting stuff, and I think is very similar to Thinking Basketball’s WOWYR.

It feels to me like there’s really just not a lot of data for a lot of this though, particularly in these past eras where players missed very few games. For instance, Doc mentioned being surprised about Karl Malone being below Charles Barkley, so I’ll use Karl Malone as an example. Karl Malone is well below Barkley in the 1989-1993 timeframe. But what is the data for Karl Malone’s 1989-1993 timeline based on? Well, here’s the list of number of missed games by players who played over 18 minutes a game in a season for the Jazz in that time period (in years they actually averaged 18+ minutes per game):

Karl Malone: 3
John Stockton: 4
Mark Eaton: 3
Thurl Bailey: 0
Darrell Griffith: 0
Bob Hansen: 37
Blue Edwards: 22
Jeff Malone: 17
Tyrone Corbin: 13
Mike Brown: 0
Jay Humphries: 4
David Benoit: 0
Larry Krystkowiak: 11

So, at least as I understand it (and I’ve admittedly not read through the actual paper, so sorry if I’m misinterpreting anything!), the model is basically trying to figure out Karl Malone’s impact essentially based on regressing what occurred in those missed games. There’s some missed games there, but nothing particularly substantial for anyone and the most substantial number of missed games are from relatively minor players. I don’t really see how a WOWY-based model can in any way accurately assess Karl Malone’s impact without almost any missed games from Malone, Stockton, or Eaton, and with several other relevant players having 0 missed games at all. It seems like the results would inevitably just be based on statistical noise centered largely around what randomly happened to occur when pretty inconsequential players like Bob Hansen and Blue Edwards were out.

Another example of this is the 1989-1993 timeframe for Jordan. He does fairly well in this timeframe, but what is the data based on? Here’s the total missed games of people on the Bulls who played 18 MPG in a given season in that timeframe:

Michael Jordan: 7
Scottie Pippen: 10
Horace Grant: 14
BJ Armstrong: 0
Bill Cartwright: 55
Scott Williams: 11
John Paxson: 7
Stacey King: 0
Craig Hodges: 33
Sam Vincent: 12
Brad Sellers: 2
Dave Corzine: 1

There’s basically virtually zero missed-game data there, except for what happened in a bunch of missed games from Bill Cartwright and Craig Hodges. Players like that don’t *really* affect games that much, but when they make up a huge portion of the teams’ missed games, what randomly happens to occur in missed games by players like that can really skew a model like this. For instance, we see above that Craig Hodges missed 33 games in years he played 18+ MPG. This was all in the 1989 season. And, based on the charts provided, we actually see Jordan’s rating in this measure tank from the 1984-1988 time period to the 1985-1989 time period and he didn’t get super high until 1989 was out of the time period, so it seems reasonably obvious that something happened in 1989 that tanked his rating. The only person that missed a lot of games that season was Craig Hodges. The Bulls happened to go 32-17 in the games Craig Hodges played and 15-18 in the games Hodges Missed (and I’m sure the difference in average margin of victory is pretty significant too). So my guess is that the model thinks Craig Hodges was really impactful (and his missed games make up a significant portion of the entire set of missed games that’s being regressed), so what happened in those games has a significant impact on Jordan’s perceived impact in time periods that contain that year (and note that Pippen dropped that same year too—though a bit less, probably because he missed several games that Hodges missed too).

Thanks for the comment and getting into the detail. You're right in that the utility of these metrics are limited due to the nature of the data we have available --- I can't and wouldn't shy away from that. I wouldn't feel like I was being responsible if I just pushed out these numbers without some important caveats like that as well as others in the document I shared. I do think there's some value in this, though.

Speaking of your specific examples, there is a bit more to it than that. These models are still making use of data with all of the players healthy to form a sort of baseline, so it's not like the "With" data doesn't matter --- it still does. Moreover, when players leave a team, they would be considered "missing" for those seasons. In your Bulls example, for instance, Craig Hodges would be listed as missing for the entirety of the 1992-93 season with respect to being Jordan's teammate. Stacey King would be considered missing for all but the 1989-90 season due to the minutes threshold. Sam Vincent would be considered missing for all but the 1988-89 season. And so on and so forth. This extra missingness allows for transactions between seasons to inform the estimates a bit more than merely looking at missed games. I'll note that for these players, their time away from Chicago would inform their baseline impact. Sam Vincent's impact with Orlando for the 1989-90, 1990-91, and 1991-92 seasons will inform his baseline impact and therefore inform his contribution to the 1988-89 Bulls' scoring margins.

Hopefully this helps clear it up a bit.

I'm happy for you or anyone else to ask more questions or offer more critiques. They may help me improve these metrics going forward!

Thanks for the clarification! Particularly the part about players being considered “missing” when they went to a different team or didn’t meet the minutes threshold. That’s relevant info about the model, of course. And it should provide the model with a bit more data on some players that changed teams. Of course, considering a player as there when they play 18 minutes and missing if they then played 17 minutes a game another season might lead to some odd/noisy outcomes in certain scenarios (i.e. the model might think a guy’s absence made a difference between two seasons when they actually were there both seasons and only played a minute less per game in one of them), but there’s no perfect way to model things.

The bottom line, though, is that I do think this is an inherently very noisy exercise. For instance, to take that Karl Malone example, I imagine the model is able to get a bit more information than I mentioned, because some of those role players missed games on other teams in the timeframe and had years below the minute threshold and whatnot. So that can potentially help improve what the model knows about some role players’ value. But when Karl Malone and John Stockton barely missed any games, I’m not sure how the model can form an accurate estimate of how the impact is parsed between those two guys (even if we assumed for arguments’ purposes that the model *could* accurately estimate the impact of everyone else on the team).

Anyways, I want to be clear that this is not a criticism of you at all. This is incredible work! I’m more just pointing out the inherent limitations in this sort of endeavor (i.e. limited data, which there’s nothing you can do about), so that people don’t take this as being more precise than it really is (or than I think you regard it to be).

You're spot on that there are limits to how much these models will be able to parse individual impact if there is extreme collinearity between teammates as is the case with Stockton and Malone in many years. And you can see their profiles are very similar as a result:

There could potentially be some value added by using some function of minutes played in the WOWY matrix (though I'm not sure Stockton and Malone would diverge that much in their minute profiles across games, either).

Finally, I'm not viewing criticism of the models or the methodology in a bad light. Taking a critical view of what these metrics can provide and what their limitations are is vital for their responsible usage.

Edit: One thing I forgot to mention about MJ. I think the dip in 1985-89 is mostly due to the sample no longer including the Bulls' improvement from 1984 to 1985 with rookie MJ in 1985.

Post #36 » by **Moonbeam** » Mon Jul 31, 2023 12:45 am

WestGOAT wrote:Could be a fun exercise to look at offensive- and defensive-WOWY. So instead of setting the points margin as y, you can set points scored or against (relative to the average for the specific time-window) as Y. Probably will be noisey, but would be interesting to see if the usual suspects that are considered defensive juggernauts would pop up, and especially if bigs being more valuable than smalls is being reproduced.

Indeed, that would be a fun extension! I'm not sure whether I might be limited a bit here by the availability of box score data. I think splitting the impact to offense and defense would only make sense with some sort of pace/possession adjustment, otherwise low pace eras will have their offensive value depressed and their defensive value enhanced, and vice versa for high paced eras. Maybe it's not such a big deal within 5-year windows, though? In any case, I'll have to see if there is enough data in the box scores to approximate possessions for the game. I think there often is, but perhaps not always if we go back to 1952.

Post #37 » by **Moonbeam** » Mon Jul 31, 2023 12:51 am

lessthanjake wrote:
Moonbeam wrote:.

Quick question on this: Is there a specific reason you’ve used the Ridge variant for the charts? Was that just a random choice? Or do you think that that’s the best version? And, if so, why? Or perhaps you’re just using that one since, at least as I understand it, RAPM is typically done using that method?

Great question! I was posting the Ridge variants as they are more directly analogous to RAPM. I'm happy to post the graphs using Lasso and Elastic Net as well. In general, these methods will zap the middle 30-70% of coefficients to 0. We can see that this happens to Bird for a few of the 5-year windows of the mid-80s and the final one for Magic:

Post #38 » by **Moonbeam** » Mon Jul 31, 2023 12:58 am

OhayoKD wrote:
Doctor MJ wrote:
OhayoKD wrote:
I do not know if anyone who voted Olajuwon above Magic has Hakeem as a stronger regular season player during their primes. The case was
-> Hakeem's teams see the biggest playoff elevation of anyone from the era team-wide, box-wide, "expected championshp differential", "srs upsets', ect ect
-> Hakeem by box and tape profiles similarly to Tim Duncan who arguably has a better impact portfolio than anyone from the 80's/90's
-> Hakeem was in a **** situation
-> Hakeem looks great in most concentrated samples(Me and Ben apply something around a 10gm/szn filter)
-> Hakeem by larger samples looks mj-ish and has better longetvity in the rs

Combine all that and that's where the Hakeem #5 and #6 votes were coming from. I'm pretty sure also benefitted from coming across as a more consistent version as a (some-times) impact outlier in Wilt.

I mean, "apples to apples" have played a big role in the shift towards "two-way rules". That's where the idea of Russell, Lebron, and Kareem as a tier onto the
mselves, and Duncan, Hakeem, and KG's elevation largely come from.

Fair point about Hakeem's candidacy being so linked to the playoffs, but of course Magic has a similar edge compared to any of the other bigs of his era.

Re: apples-to-apples played a big role already. We're not talking about something that's implemented once and that's it. Moonbeam is giving us a new lens. We have not yet seen it across all eras, hence, our capacity for apples-to-apples with is still not where it can become.

Sure. I'm just saying right now "magic johnson the impact king" is not really "anything new". The idea is that Magic johnson is a weaker impact king than the aforementioned big-three and is matched by a bunch of other two-way forces. Would be remiss to not mention that David Robinson(who looks by other stuff like he would be Magic's closest rival here) isn't included. Obviously the playoffs hurt that.
Re: shifted toward two-way rules, where the idea of Russell. Here you just have me confused. Russell isn't the story of two-way rules. "Two-way rules" thinking has always been used against Russell with respect to Wilt and others.

Well more specifically it is a story of "defense rules" which is a direct opposite of "offense rules". If you were to take it at face-value the story would go something like "huh defense>offense", well then, "defense+offense>>offense!". Of course that isn't really how one should interpret it, but it still is a fair warning for people who make assumptions about the cieling of the split. As is Dikembe, Walton, Thurmond, and Draymond can be argued to echo that warning. Even an AD or Giannis may see a more lopsided distribution towards defense depending on what you look at.

It's interesting that you mention this, because one thing I was thinking of looking at the metrics for those who made All-Defensive teams (not a great measure of defensive value, but something). I wanted to compare those who made All-Defensive teams but didn't make the All-Star team (or All-League, I'll have to think about which is better) vs. those who did make the All-Star/All-League team but not the All-Defensive team. It's going to take me some time to put that together, but it's one of a few "big picture" things I'm thinking about with these metrics.

Some others I'm thinking of are ABA vs NBA league quality and typical player age range peaks through history. If you or anyone else have any other big picture stuff you think would be interesting based on this data, let me know!

Also, just for the sake of clarity, these metrics include both regular season and postseason games in their samples.

homecourtloss · Post #39 » by **homecourtloss** » Mon Jul 31, 2023 1:06 am

Moonbeam wrote:
Doctor MJ wrote:
Moonbeam wrote:Yeah, Magic and Bird look outstanding here, particularly Magic. With some older box score data available now, there are a couple others who also jump out:

You're also right that these aren't per-possession metrics, but per-game, so Bobby Jones standing out looks quite impressive.

I think this is important data for us to see - while being cautious about taking noisy data too far.

I completely get being skeptical of Magic & Bird relative to other superstars. Easy to have an interpretation of "Transcendent offensive players sure, but limited on defense, can they really be more impactful than 2-way stars? Maybe we've just got a narrative full of winning bias.", but I think there's good reason to think that these guys were every bit the extreme impact outliers they're made out to be.

Cool seeing the '60s stars there too, and yeah, Russell fits along with Magic & Bird as guys who seem like they should be being held back by half the game but the impact data may tell us otherwise.

I fully agree that we need to interpret these results with some caution given the limitations of the data. And you are right that there could be some winning bias at play here, as the models are soaking up lots of good margins for players on good teams (and lots of bad ones for players on bad teams). But Magic and Bird stand out relative to their Laker and Celtic teammates as well. So does Russell in comparison to Cousy, Sharman, Jones, Heinsohn, etc.:

Ultimately, I think these metrics do have some value, but I absolutely would encourage everyone to consider them with appropriate caution in light of the limitations of the data.

Thank you again, MB. From the beginning, the limitations have been in the back of most peoples minds when seeing these, but seeing new data or data, that’s not been presented this way is exciting when thinking about an arrow that is well outside the bounds of the data ball era.

Also, I am pretty sure that we will see this graph again the next time we see a top 100 project. Bill Russell’s case is getting stronger and stronger.

rk2023 · Post #40 » by **rk2023** » Mon Jul 31, 2023 1:25 am

Simply phenomenal stuff here. The share is very much appreciated (can only speak for myself, but it seems consensus in this context

)!