Penalized Regression of WOWY data

lessthanjake · Post #61 » by **lessthanjake** » Mon Jul 31, 2023 7:18 am

A somewhat specific/random request from me:

Gilbert Arenas
Caron Butler
Antawn Jamison
Larry Hughes
Brendan Haywood

I’m kind of bizarrely convinced from having watched them for years that Brendan Haywood was actually the most impactful player on the early/mid 2000’s Wizards, and I’m just curious if I can add some fuel to my fire on that take.

Post #62 » by **Moonbeam** » Mon Jul 31, 2023 7:24 am

Doctor MJ wrote:
homecourtloss wrote:
Doctor MJ wrote:
Oh man this is fun. If we're annoying you and you want to encourage us to do something that takes the work from you let me know.

So, naive interpretation of this graph:

By peak we're looking at Nash > Stockton > Price >>> Kidd, with Price having by far the worst longevity. Make sense, though I have to acknowledge I wasn't expecting Kidd to be so far behind the rest.

It's interesting that Stockton peaks in the mid-to-late '90s.

From the available numbers that we had with Pollack’s data, and Stockton’s late career ON/OFF and RAPM, if he peaked before THAT it would be absolutely wild.

Kidd and Nash look closer in RAPM models, though it is interesting to see that Nash basically lived in the 90th+ percentile all the way up until the 2009–2013 period.

And Magic never left the 95–99th except for when he came back 5 years and out of shape and STILL was an impact player.

I agree given Stockton's later numbers it would be super, super impressive for him to do even better than that earlier, but earlier was also when he was putting up his biggest box score and biggest minutes, so we'd kind of thing that that's when he should look best here. I'm not surprised, and I'm not really unsurprised. This is the sort of thing I like to try to come at as fresh as I can when I get new data.

Re: Kidd & Nash closer by RAPM. And of course this brings us to a key question:

Is the Moonbeam dispersion due to noise, or is there something about the data that plays differently than more granular +/- fed into regression?

At my kid’s swimming lesson now so no code for the moment, but I think I’d rank the sources of disparity as follows:

1. Noise: there is a lot lost by using game data alone, so that cleary will be a big factor
2. (Maybe a close 2): most RAPM sources I have seen are prior informed, so they make use of some other data to set benchmarks for the shrinkage that would be different for each player, not 0 for everyone as would be the case here.
3. (Probably a distant 3): Granular differences that are meaningful. Even with +/- data, I think a player can actually impact the performance of the team in minutes off the court. Good foul drawers, for instance, may put the opponent in the bonus but sit while teammates earn free throws from non-shooting fouls. They may also get opponents into foul trouble and thereby limit their minutes. There’s also coaching. How a coach responds to a player being out for a game vs out for a few minutes each quarter could be something that would be captured by this.

More graphs soon!

Owly · Post #63 » by **Owly** » Mon Jul 31, 2023 8:14 am

Moonbeam wrote:
Doctor MJ wrote:
homecourtloss wrote:
From the available numbers that we had with Pollack’s data, and Stockton’s late career ON/OFF and RAPM, if he peaked before THAT it would be absolutely wild.

Kidd and Nash look closer in RAPM models, though it is interesting to see that Nash basically lived in the 90th+ percentile all the way up until the 2009–2013 period.

And Magic never left the 95–99th except for when he came back 5 years and out of shape and STILL was an impact player.

I agree given Stockton's later numbers it would be super, super impressive for him to do even better than that earlier, but earlier was also when he was putting up his biggest box score and biggest minutes, so we'd kind of thing that that's when he should look best here. I'm not surprised, and I'm not really unsurprised. This is the sort of thing I like to try to come at as fresh as I can when I get new data.

Re: Kidd & Nash closer by RAPM. And of course this brings us to a key question:

Is the Moonbeam dispersion due to noise, or is there something about the data that plays differently than more granular +/- fed into regression?

At my kid’s swimming lesson now so no code for the moment, but I think I’d rank the sources of disparity as follows:

1. Noise: there is a lot lost by using game data alone, so that cleary will be a big factor
2. (Maybe a close 2): most RAPM sources I have seen are prior informed, so they make use of some other data to set benchmarks for the shrinkage that would be different for each player, not 0 for everyone as would be the case here.
3. (Probably a distant 3): Granular differences that are meaningful. Even with +/- data, I think a player can actually impact the performance of the team in minutes off the court. Good foul drawers, for instance, may put the opponent in the bonus but sit while teammates earn free throws from non-shooting fouls. They may also get opponents into foul trouble and thereby limit their minutes. There’s also coaching. How a coach responds to a player being out for a game vs out for a few minutes each quarter could be something that would be captured by this.

More graphs soon!

On Stockton: Just a reminder that the "versus 76ers" (in - obviously - a tiny sample) he looked really good and very much the standout of the Jazz pair (rather than similar numbers that would make the driver - if not luck - difficult to parse out).

More broadly.
Haven't had a chance to read, don't know if I'd be able to grasp it but it looks interesting, serious-minded stuff so thanks for your efforts.

homecourtloss wrote:
Moonbeam wrote:By the way, I think the modern guys can give us some nice "sanity checks" on this methodology. The Cheema 5-year RAPM scores I compared RWOWY-Ridge to in the document are prior-informed, whereas these are not prior-informed, so that may explain some discrepancies, but it's still good to benchmark against what is available through more precise +/- data.[/b]

Less so because of less data, no pbp, but could the Pollack data, especially for long-term 76ers players (i.e. beyond 94-96 and more useful than "versus 76ers" [maybe division rivals??]) be used (... maybe it is already, as I say haven't really looked yet)?

Post #64 » by **Moonbeam** » Mon Jul 31, 2023 9:00 am

Doctor MJ wrote:So, the next query on my mind are current players that aren't already listed but who could end up in the Top 100. List is going to be longer, so I'll just make groups of 4 to make them as contemporaneous as possible. Please feel ZERO pressure to do any of this. I just have enough interest that I can probably ask questions at an annoying rate.

Paul George
Kawhi Leonard
Jimmy Butler
Klay Thompson

Anthony Davis
Damian Lillard
Giannis Antetokounmpo
Nikola Jokic

Devin Booker
Joel Embiid
Jayson Tatum
Luka Doncic

Here you go. Some of these to me suggest some winning bias, potentially? Not sure. Also, my console is not fond of accented characters, so my apologies for the '?' symbols that it converts everything it doesn't recognize to. :lol:

Post #65 » by **Moonbeam** » Mon Jul 31, 2023 9:03 am

eminence wrote:An Eminence favorites request:
Andrei Kirilenko
Baron Davis
Ricky Rubio
Derrick Favors
Joe Ingles

Here you go. Rubio and Jingles looking pretty comparable here!

Post #66 » by **Moonbeam** » Mon Jul 31, 2023 9:43 am

70sFan wrote:I think comparing the 1980s high scoring wings wojld be cool to see as well:

Adrian Dantley
Alex English
Bernard King
Mark Aguirre
Marques Johnson
Dominique Wilkins

Some of my favorite players ever in this group. I had some graphs including these guys in the document, but now I've got their earlier seasons too.

Post #67 » by **Moonbeam** » Mon Jul 31, 2023 9:46 am

homecourtloss wrote:
Moonbeam wrote:By the way, I think the modern guys can give us some nice "sanity checks" on this methodology. The Cheema 5-year RAPM scores I compared RWOWY-Ridge to in the document are prior-informed, whereas these are not prior-informed, so that may explain some discrepancies, but it's still good to benchmark against what is available through more precise +/- data.[/b]

How does it look overall? I know you talked about it a little bit in your paper. Tagging some of the active users in this threa

eminence wrote:

Doctor MJ wrote:

OhayoKD wrote:

AEnigma wrote:

WestGOAT wrote:

The correlation with Cheema's prior-informed RAPM for the 21 5-year periods in common was about 0.41 for players with at least 5000 possessions. It gets a bit higher when you look at those who won awards or discard some weird edge cases around the minutes per game threshold I've used. The biggest factors explaining the difference (IMO) are that Cheema's estimates are prior-informed and the WOWY data misses a lot of nuance of +/-.

Post #68 » by **Moonbeam** » Mon Jul 31, 2023 10:33 am

lessthanjake wrote:A somewhat specific/random request from me:

Gilbert Arenas
Caron Butler
Antawn Jamison
Larry Hughes
Brendan Haywood

I’m kind of bizarrely convinced from having watched them for years that Brendan Haywood was actually the most impactful player on the early/mid 2000’s Wizards, and I’m just curious if I can add some fuel to my fire on that take.

Lots of overlap among these guys.

Post #69 » by **Moonbeam** » Mon Jul 31, 2023 10:58 am

Owly wrote:
Moonbeam wrote:
Doctor MJ wrote:
I agree given Stockton's later numbers it would be super, super impressive for him to do even better than that earlier, but earlier was also when he was putting up his biggest box score and biggest minutes, so we'd kind of thing that that's when he should look best here. I'm not surprised, and I'm not really unsurprised. This is the sort of thing I like to try to come at as fresh as I can when I get new data.

Re: Kidd & Nash closer by RAPM. And of course this brings us to a key question:

Is the Moonbeam dispersion due to noise, or is there something about the data that plays differently than more granular +/- fed into regression?

At my kid’s swimming lesson now so no code for the moment, but I think I’d rank the sources of disparity as follows:

1. Noise: there is a lot lost by using game data alone, so that cleary will be a big factor
2. (Maybe a close 2): most RAPM sources I have seen are prior informed, so they make use of some other data to set benchmarks for the shrinkage that would be different for each player, not 0 for everyone as would be the case here.
3. (Probably a distant 3): Granular differences that are meaningful. Even with +/- data, I think a player can actually impact the performance of the team in minutes off the court. Good foul drawers, for instance, may put the opponent in the bonus but sit while teammates earn free throws from non-shooting fouls. They may also get opponents into foul trouble and thereby limit their minutes. There’s also coaching. How a coach responds to a player being out for a game vs out for a few minutes each quarter could be something that would be captured by this.

More graphs soon!

On Stockton: Just a reminder that the "versus 76ers" (in - obviously - a tiny sample) he looked really good and very much the standout of the Jazz pair (rather than similar numbers that would make the driver - if not luck - difficult to parse out).

More broadly.
Haven't had a chance to read, don't know if I'd be able to grasp it but it looks interesting, serious-minded stuff so thanks for your efforts.

homecourtloss wrote:
Moonbeam wrote:By the way, I think the modern guys can give us some nice "sanity checks" on this methodology. The Cheema 5-year RAPM scores I compared RWOWY-Ridge to in the document are prior-informed, whereas these are not prior-informed, so that may explain some discrepancies, but it's still good to benchmark against what is available through more precise +/- data.[/b]

Less so because of less data, no pbp, but could the Pollack data, especially for long-term 76ers players (i.e. beyond 94-96 and more useful than "versus 76ers" [maybe division rivals??]) be used (... maybe it is already, as I say haven't really looked yet)?

That's interesting about Stockton with the Pollack data. It could be used to inform things if I look to extend this with some sort of external information, but it would have to be a supplement to other information as it only goes back to the mid-80s IIRC, no?

Owly · Post #70 » by **Owly** » Mon Jul 31, 2023 11:39 am

Moonbeam wrote:
Owly wrote:
Moonbeam wrote:
At my kid’s swimming lesson now so no code for the moment, but I think I’d rank the sources of disparity as follows:

1. Noise: there is a lot lost by using game data alone, so that cleary will be a big factor
2. (Maybe a close 2): most RAPM sources I have seen are prior informed, so they make use of some other data to set benchmarks for the shrinkage that would be different for each player, not 0 for everyone as would be the case here.
3. (Probably a distant 3): Granular differences that are meaningful. Even with +/- data, I think a player can actually impact the performance of the team in minutes off the court. Good foul drawers, for instance, may put the opponent in the bonus but sit while teammates earn free throws from non-shooting fouls. They may also get opponents into foul trouble and thereby limit their minutes. There’s also coaching. How a coach responds to a player being out for a game vs out for a few minutes each quarter could be something that would be captured by this.

More graphs soon!

On Stockton: Just a reminder that the "versus 76ers" (in - obviously - a tiny sample) he looked really good and very much the standout of the Jazz pair (rather than similar numbers that would make the driver - if not luck - difficult to parse out).

More broadly.
Haven't had a chance to read, don't know if I'd be able to grasp it but it looks interesting, serious-minded stuff so thanks for your efforts.

homecourtloss wrote:

Less so because of less data, no pbp, but could the Pollack data, especially for long-term 76ers players (i.e. beyond 94-96 and more useful than "versus 76ers" [maybe division rivals??]) be used (... maybe it is already, as I say haven't really looked yet)?

That's interesting about Stockton with the Pollack data. It could be used to inform things if I look to extend this with some sort of external information, but it would have to be a supplement to other information as it only goes back to the mid-80s IIRC, no?

Very much not the expert here and I think you'd always want to supplement stuff with more info if it's useful. I believe the 76ers +/- goes from the merger year on (so 76-77 season onwards).

AEnigma · Post #71 » by **AEnigma** » Mon Jul 31, 2023 4:27 pm

Moonbeam wrote:

What about Reggie Miller, Tim Hardaway, Penny Hardaway, Grant Hill, Allen Iverson, and Chauncey Billups?

Thanks again. Fascinating stuff.

lessthanjake · Post #72 » by **lessthanjake** » Mon Jul 31, 2023 4:47 pm

Moonbeam wrote:
lessthanjake wrote:A somewhat specific/random request from me:

Gilbert Arenas
Caron Butler
Antawn Jamison
Larry Hughes
Brendan Haywood

I’m kind of bizarrely convinced from having watched them for years that Brendan Haywood was actually the most impactful player on the early/mid 2000’s Wizards, and I’m just curious if I can add some fuel to my fire on that take.

Lots of overlap among these guys.

Am honestly kind of devastated that my boy Haywood doesn’t look good here :lol:

homecourtloss · Post #73 » by **homecourtloss** » Mon Jul 31, 2023 5:16 pm

Moonbeam wrote:
Doctor MJ wrote:So, the next query on my mind are current players that aren't already listed but who could end up in the Top 100. List is going to be longer, so I'll just make groups of 4 to make them as contemporaneous as possible. Please feel ZERO pressure to do any of this. I just have enough interest that I can probably ask questions at an annoying rate.

Paul George
Kawhi Leonard
Jimmy Butler
Klay Thompson

Anthony Davis
Damian Lillard
Giannis Antetokounmpo
Nikola Jokic

Devin Booker
Joel Embiid
Jayson Tatum
Luka Doncic

Here you go. Some of these to me suggest some winning bias, potentially? Not sure. Also, my console is not fond of accented characters, so my apologies for the '?' symbols that it converts everything it doesn't recognize to.

Thank you! Klay looking pretty strong. AD’s matching much of the other data with some underwhelming performances, especially in the regular season.

DraymondGold · Post #74 » by **DraymondGold** » Mon Jul 31, 2023 8:02 pm

So it looks like the only standard ~Top 25 candidates we're missing (i.e. not in this thread or in the paper in OP) would be,

-Young 70s Kareem (we only have 77+ Kareem. e.g. could compare him vs other NBA 70s stars you're interested in, such as Walton or Cowens or McAdoo or Frazier or Havlicek)
-The 90s–2010s Big men: Shaq vs Duncan vs KG vs Dirk
-the 00s/10s guards: Chris Paul vs Wade

If you have time (no rush, and no worries if not!), I'd love to see these comparisons

rk2023 · Post #75 » by **rk2023** » Mon Jul 31, 2023 8:22 pm

DraymondGold wrote:So it looks like the only standard ~Top 25 candidates we're missing (i.e. not in this thread or in the paper in OP) would be,

-Young 70s Kareem (we only have 77+ Kareem. e.g. could compare him vs other NBA 70s stars you're interested in, such as Walton or Cowens or McAdoo or Frazier or Havlicek)
-The 90s–2010s Big men: Shaq vs Duncan vs KG vs Dirk
-the 00s/10s guards: Chris Paul vs Wade

If you have time (no rush, and no worries if not!), I'd love to see these comparisons

I linked in the general thread, but there's an outline of a study done by Blackmill in 2017 which has WOWY values for at-least some of these players (all I remember vividly is Kareem grading rather highly).

homecourtloss · Post #76 » by **homecourtloss** » Mon Jul 31, 2023 11:20 pm

I feel like we’re getting an exclusive service not available anywhere else for free. Is it possible to run one for Drexler and Terry Porter?

And then one for Ewing, Pippen, Barkley, Reggie Miller, and Payton?

Post #77 » by **Moonbeam** » Tue Aug 1, 2023 12:15 am

STOP THE PRESSES!

I've discovered an error in box scores in the play-by-play era. Essentially, my extraction of box scores was eliminating the home team players from 1997 onward. This meant that the models which include data from 1997 onward are only reflecting the final margin and the players on the away team, without taking into account the players on the home team at all. This obviously will impact the results in a major way.

I am very sorry about this! As such, I would ignore any results posted in this thread that include any seasons from 1997 onward as a result. The results for any 5-year period that doesn't include seasons from 1997 onward should be unaffected.

This also means the analysis comparing these results to the 5-year RAPM is compromised. What an epic fail on my part! :lol:

I will be re-running the analysis once I rectify the issue with the box scores, but that may take several days.

My apologies to everyone in this thread who has taken some information from post-97 samples and drawn any conclusions from it. I feel pretty awful about this.

eminence · Post #78 » by **eminence** » Tue Aug 1, 2023 12:16 am

Moonbeam wrote:STOP THE PRESSES!

I've discovered an error in box scores in the play-by-play era. Essentially, my extraction of box scores was eliminating the home team players from 1997 onward. This meant that the models which include data from 1997 onward are only reflecting the final margin and the players on the away team, without taking into account the players on the home team at all. This obviously will impact the results in a major way.

I am very sorry about this! As such, I would ignore any results posted in this thread that include any seasons from 1997 onward as a result. The results for any 5-year period that doesn't include seasons from 1997 onward should be unaffected.

This also means the analysis comparing these results to the 5-year RAPM is compromised. What an epic fail on my part!

I will be re-running the analysis once I rectify the issue with the box scores, but that may take several days.

My apologies to everyone in this thread who has taken some information from post-97 samples and drawn any conclusions from it. I feel pretty awful about this.

No worries, it'll happen when you're getting us the freshest ink off the printer. Much appreciated and good catch

AEnigma · Post #79 » by **AEnigma** » Tue Aug 1, 2023 2:31 am

Too late, already moved Eddie Jones ahead of Kobe on my all-time list.

Post #80 » by **Doctor MJ** » Tue Aug 1, 2023 3:41 am

Moonbeam wrote:STOP THE PRESSES!

I've discovered an error in box scores in the play-by-play era. Essentially, my extraction of box scores was eliminating the home team players from 1997 onward. This meant that the models which include data from 1997 onward are only reflecting the final margin and the players on the away team, without taking into account the players on the home team at all. This obviously will impact the results in a major way.

I am very sorry about this! As such, I would ignore any results posted in this thread that include any seasons from 1997 onward as a result. The results for any 5-year period that doesn't include seasons from 1997 onward should be unaffected.

This also means the analysis comparing these results to the 5-year RAPM is compromised. What an epic fail on my part!

I will be re-running the analysis once I rectify the issue with the box scores, but that may take several days.

My apologies to everyone in this thread who has taken some information from post-97 samples and drawn any conclusions from it. I feel pretty awful about this.

Thank you for letting us know. I don't think you have any thing to be sorry about. It's probably for the best we check our excitement a bit.

Looking forward to your fixed batch.

Question: Was then effectively creating an Away-WOWY regression? It's honestly never occurred to me before to consider Home vs Away RAPM or WOWY, and it seems like an idea with potential.