Penalized Regression of WOWY data

Moderators: Doctor MJ, trex_8063, penbeast0, PaulieWal, Clyde Frazier

lessthanjake
Analyst
Posts: 3,472
And1: 3,105
Joined: Apr 13, 2013

Re: Penalized Regression of WOWY data 

Post#161 » by lessthanjake » Mon Aug 7, 2023 3:46 pm

homecourtloss wrote:
Moonbeam wrote:
homecourtloss wrote:
A few requests for Moon if possible:

can you create a graph for Ralph Sampson, Hakeem, Rodney Mcray, and Robert Reid, Ridge and Lasso?

And another for Jordan, Pippen, Horace Grant, and BJ Armstrong, Ridge and Lasso?


Here are the 80s Rockets:

Image

Image

Hakeem dominates as expected.

And the 90s Bulls:

Image

Image

Grant looks pretty great here. He did have the benefit of playing for good teams throughout his career, but the fact he was able to be a positive contributor for all of them (also confirmed via RAPM I believe) is certainly a signal that he is one of the better unheralded guys of the era. B.J. Armstrong looks amazing to start, but I think a lot of that is joining a team that took off and was great. Once the Warriors seasons creep into the sample, he plummets as expected.


Thank you, Moon. I’m trying to think of any other trio that played that many seasons and minutes together do that well together—you have essentially the entire Bulls run with Jordan, Grant, Pippen in the 90th+ percentile, with BJ Armstrong looking very strong early.


With regards to BJ Armstrong, I feel like I really need to reprise a lot of the things I’ve noted earlier in the thread here, since I think it’s misleading to talk about him looking great here.

1. I feel like I need to remind people that there’s some players that the model just has almost no data on. BJ Armstrong missed a grand total of 1 game in his first 7 seasons in the NBA. So the model basically has almost no way of parsing out how big of an effect he was having specifically. It can try to figure that out using estimates of other players, but then you have other Bulls players who barely missed games too (such as, for example, Pippen and Jordan in most of those timeframes), such that the model has no great way of figuring out who was having what effect. When the model actually eventually got data on missed games by Armstrong, he hovered around the 50th percentile (even in a time period that still mostly included Bulls years), and never really goes above that again.

2. To the extent the model is estimating Armstrong’s effect, it’s likely estimating it using information that isn’t really right, based on the minutes cutoff being used. And this likely is juicing up his score a significant amount in his early years. Armstrong played 16 minutes a game in his rookie season (1989-1990), so he barely missed the 18 MPG threshold for being considered by the model in that year. That means that the model considered Armstrong as not being there in 1989-1990 and being there in 1990-1991, when what really happened is just that his MPG went up a little bit. The Bulls got a good deal better from 1989-1990 to 1990-1991, and the model thinks that Armstrong wasn’t there the first year and was there the second. So it is almost certainly giving Armstrong significantly more credit than he deserves for the team getting better, given that it thinks he went from not playing to playing every game and the Bulls got a lot better, when he actually went from playing 16 minutes a game to 21 minutes a game. Once that year is out of the system, Armstrong never looks very good again. I don’t know that there’s any way around getting some weird results due to the minutes cutoff, but we should at least be cognizant of when something like that is obviously at play.

3. Again, I think people need to look at the years that are being shown on the graphs. The time periods where BJ Armstrong is above the 90th percentile are all introductory time periods where he did not play a full 5 years in the timeframe (note: The model doesn’t think he played his rookie year, since he didn’t play 18 minutes a game). The moment he’s got a full sample of seasons, he goes below the 90th percentile (in the Ridge version; he stays slightly above it in the lasso version, before plummeting the next year).
OhayoKD wrote:Lebron contributes more to all the phases of play than Messi does. And he is of course a defensive anchor unlike messi.
DraymondGold
Senior
Posts: 703
And1: 903
Joined: May 19, 2022

Re: Penalized Regression of WOWY data 

Post#162 » by DraymondGold » Mon Aug 7, 2023 3:52 pm

Moonbeam wrote:
DraymondGold wrote:
Moonbeam wrote:Kind of! There is a bit of potentially useful information lost by looking at percentiles only. The caveat to that is because the ridge models are choosing different penalties for each window, the degree of "outlierness" is not always directly comparable. But yes, Russell and Magic are beasts in these pure versions!
Hmm, so this is interesting. It suggests the raw numbers might not be good to compare across era (at least in ridge, and presumably in the other methods as well?).

If percentiles / rank (adjusted for sample size) are a better way to compare players, perhaps we might look something like:
-Average Percentile at a shorter timescale (e.g. best 5 samples in a row), medium (bets 10 samples in a row), and full-career timescale (every sample) for the standard top set of players (e.g. the list Doc mentioned earlier in this thread for a shorter list, or maybe something like the Top 20 players from last year's + Curry/Durant/Paul/Walton for a larger list)

Or some variant of this methodology, if we wanted to do a more systematic but fair comparison of players across era in this new metric.

...

On another note, I wonder if there are any trends on what the standard deviations are in the ridge values over time.

I checked in the spreadsheet you sent, and it looks like...
-there's a big peak in standard deviation in the 60s
-then it dips until a minor peak around ~90,
-a dip until another major peak around ~06,
-a dip around ~12,
-then the biggest peak is in recent times.
Of course this is only based on the Top 100 players included in the spreadsheet, so the trends across the full player list might be different (e.g. this only includes positive RWOWY players). As for what to interpret from this, I'm not sure. Presumably it relates to the varying penalties for each window? But as for why those trends occur, I'd have to think more. But I'm open to ideas if anyone has any!
Great questions! I think the percentiles are the best for cross-era comparison because the penalties do indeed change with each window based on what is deemed best through cross validation. In general, the penalty that is best is a function of how much variation there is in the scoring margins and how much information there is about the impact of players through a WOWY lens (via injuries, changing teams, rotation size, etc.). Eras with a lot of transactions or a lot more injuries would likely have stronger information signals about players and perhaps require less penalization as a result. For the 60s, I'd guess the overlap with the ABA would be one factor, and for recent times, load management means there are more 'Without' games for a lot of players (though I'd imagine there are more players average 18 MPG per team as a result).
I'd be happy to help calculate these 5 sample, 10 sample, and full-career sample percentile averages for the top players.

Would you be able to either add the player's percentiles to the Google sheet, or if it's easier could you give us a list of the total number of players in each sample?

We can calculate true percentiles in samples that have fewer than 100 players since all players then appear on the Google sheet (e.g. 1952-1956), but when there are samples that may have more than 100 players (e.g. 1964-1968), we don't know how many players there are so we aren't able to calculate the star's true percentile there.
Doctor MJ
Senior Mod
Senior Mod
Posts: 53,674
And1: 22,620
Joined: Mar 10, 2005
Location: Cali
     

Re: Penalized Regression of WOWY data 

Post#163 » by Doctor MJ » Mon Aug 7, 2023 5:09 pm

Moonbeam wrote:
And the 90s Bulls:

Image

Image

Grant looks pretty great here. He did have the benefit of playing for good teams throughout his career, but the fact he was able to be a positive contributor for all of them (also confirmed via RAPM I believe) is certainly a signal that he is one of the better unheralded guys of the era. B.J. Armstrong looks amazing to start, but I think a lot of that is joining a team that took off and was great. Once the Warriors seasons creep into the sample, he plummets as expected.


So I think we have something here that would could call the Tom Gola Problem (or the BJ Anderson Problem), where a guy who arrives/leaves at the same time as a team jump gets the correlation-is-causation fallacy benefit.

I think the really tricky part here isn't dealing with its affect on the Golas and Andersons, but the effect it can have on their teammates.

If I simply compare Arizin's numbers to other superstars of his era, his numbers here look disappointing...but it seems likely that this has everything to do with the existence of rookie Gola joining the team, and I don't find evidence compelling that Gola was actually the MVP of that team.
Getting ready for the RealGM 100 on the PC Board

Come join the WNBA Board if you're a fan!
lessthanjake
Analyst
Posts: 3,472
And1: 3,105
Joined: Apr 13, 2013

Re: Penalized Regression of WOWY data 

Post#164 » by lessthanjake » Mon Aug 7, 2023 7:49 pm

Doctor MJ wrote:
Moonbeam wrote:
And the 90s Bulls:

Image

Image

Grant looks pretty great here. He did have the benefit of playing for good teams throughout his career, but the fact he was able to be a positive contributor for all of them (also confirmed via RAPM I believe) is certainly a signal that he is one of the better unheralded guys of the era. B.J. Armstrong looks amazing to start, but I think a lot of that is joining a team that took off and was great. Once the Warriors seasons creep into the sample, he plummets as expected.


So I think we have something here that would could call the Tom Gola Problem (or the BJ Anderson Problem), where a guy who arrives/leaves at the same time as a team jump gets the correlation-is-causation fallacy benefit.

I think the really tricky part here isn't dealing with its affect on the Golas and Andersons, but the effect it can have on their teammates.

If I simply compare Arizin's numbers to other superstars of his era, his numbers here look disappointing...but it seems likely that this has everything to do with the existence of rookie Gola joining the team, and I don't find evidence compelling that Gola was actually the MVP of that team.


Yeah, and the BJ Armstrong Problem is even more obvious since his first year for model purposes isn’t even actually his first year. It’s just the year he played a few more minutes and therefore crossed the 18 MPG threshold. The team got better while Armstrong played a few more MPG, and the model thinks the team got better when Armstrong first joined. And you’re right that giving Armstrong credit for more than he deserves will naturally lower the credit given to other players on the team.
OhayoKD wrote:Lebron contributes more to all the phases of play than Messi does. And he is of course a defensive anchor unlike messi.
User avatar
homecourtloss
RealGM
Posts: 11,512
And1: 18,902
Joined: Dec 29, 2012

Re: Penalized Regression of WOWY data 

Post#165 » by homecourtloss » Mon Aug 7, 2023 8:46 pm

lessthanjake wrote:
homecourtloss wrote:
Moonbeam wrote:
Here are the 80s Rockets:

Image

Image

Hakeem dominates as expected.

And the 90s Bulls:

Image

Image

Grant looks pretty great here. He did have the benefit of playing for good teams throughout his career, but the fact he was able to be a positive contributor for all of them (also confirmed via RAPM I believe) is certainly a signal that he is one of the better unheralded guys of the era. B.J. Armstrong looks amazing to start, but I think a lot of that is joining a team that took off and was great. Once the Warriors seasons creep into the sample, he plummets as expected.


Thank you, Moon. I’m trying to think of any other trio that played that many seasons and minutes together do that well together—you have essentially the entire Bulls run with Jordan, Grant, Pippen in the 90th+ percentile, with BJ Armstrong looking very strong early.


With regards to BJ Armstrong, I feel like I really need to reprise a lot of the things I’ve noted earlier in the thread here, since I think it’s misleading to talk about him looking great here.

1. I feel like I need to remind people that there’s some players that the model just has almost no data on. BJ Armstrong missed a grand total of 1 game in his first 7 seasons in the NBA. So the model basically has almost no way of parsing out how big of an effect he was having specifically. It can try to figure that out using estimates of other players, but then you have other Bulls players who barely missed games too (such as, for example, Pippen and Jordan in most of those timeframes), such that the model has no great way of figuring out who was having what effect. When the model actually eventually got data on missed games by Armstrong, he hovered around the 50th percentile (even in a time period that still mostly included Bulls years), and never really goes above that again.

2. To the extent the model is estimating Armstrong’s effect, it’s likely estimating it using information that isn’t really right, based on the minutes cutoff being used. And this likely is juicing up his score a significant amount in his early years. Armstrong played 16 minutes a game in his rookie season (1989-1990), so he barely missed the 18 MPG threshold for being considered by the model in that year. That means that the model considered Armstrong as not being there in 1989-1990 and being there in 1990-1991, when what really happened is just that his MPG went up a little bit. The Bulls got a good deal better from 1989-1990 to 1990-1991, and the model thinks that Armstrong wasn’t there the first year and was there the second. So it is almost certainly giving Armstrong significantly more credit than he deserves for the team getting better, given that it thinks he went from not playing to playing every game and the Bulls got a lot better, when he actually went from playing 16 minutes a game to 21 minutes a game. Once that year is out of the system, Armstrong never looks very good again. I don’t know that there’s any way around getting some weird results due to the minutes cutoff, but we should at least be cognizant of when something like that is obviously at play.

3. Again, I think people need to look at the years that are being shown on the graphs. The time periods where BJ Armstrong is above the 90th percentile are all introductory time periods where he did not play a full 5 years in the timeframe (note: The model doesn’t think he played his rookie year, since he didn’t play 18 minutes a game). The moment he’s got a full sample of seasons, he goes below the 90th percentile (in the Ridge version; he stays slightly above it in the lasso version, before plummeting the next year).


Eh, I was just surprised seeing BJ up there in the 99th basically at the top in a few years. His impact likely isn’t this but looks to be a positive player, more than what I previously thought.

The real winner here is Horace Grant. He is just super impressive and hasn’t been given enough credit I think.

Looking to see if amy other team had a trio basically be 90th+ percentile for nearly a decade like Jordan, Pippen, and Grant were.(Eminence offered up sone possibilities).
lessthanjake wrote:Kyrie was extremely impactful without LeBron, and basically had zero impact whatsoever if LeBron was on the court.

lessthanjake wrote: By playing in a way that prevents Kyrie from getting much impact, LeBron ensures that controlling for Kyrie has limited effect…
lessthanjake
Analyst
Posts: 3,472
And1: 3,105
Joined: Apr 13, 2013

Re: Penalized Regression of WOWY data 

Post#166 » by lessthanjake » Tue Aug 8, 2023 3:43 pm

Would it be possible to see a Ridge chart with Paul Pressey, Magic Johnson, Kareem, and Sidney Moncrief?
OhayoKD wrote:Lebron contributes more to all the phases of play than Messi does. And he is of course a defensive anchor unlike messi.
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,340
And1: 5,102
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#167 » by Moonbeam » Wed Aug 9, 2023 1:46 am

eminence wrote:
homecourtloss wrote:Thank you, Moon. I’m trying to think of any other trio that played that many seasons and minutes together do that well together—you have essentially the entire Bulls run with Jordan, Grant, Pippen in the 90th+ percentile, with BJ Armstrong looking very strong early.


The Mikan Lakers looked kind of similar, with Mikan/Pollard/Martin all on top of the league.

Similarly, could I get a recent Warriors graph? (Steph/Dray/Klay/Andre/KD, can't really think of a 6th I'd be that interested in)


Here you go:

Image
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,340
And1: 5,102
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#168 » by Moonbeam » Wed Aug 9, 2023 1:46 am

Colbinii wrote:Any chance we can do the bulls with Grant/Rodman/Kukoc/Purdue?


Here you go. Perdue falling well short of the others.

Image
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,340
And1: 5,102
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#169 » by Moonbeam » Wed Aug 9, 2023 1:48 am

DraymondGold wrote:
Moonbeam wrote:
DraymondGold wrote: Hmm, so this is interesting. It suggests the raw numbers might not be good to compare across era (at least in ridge, and presumably in the other methods as well?).

If percentiles / rank (adjusted for sample size) are a better way to compare players, perhaps we might look something like:
-Average Percentile at a shorter timescale (e.g. best 5 samples in a row), medium (bets 10 samples in a row), and full-career timescale (every sample) for the standard top set of players (e.g. the list Doc mentioned earlier in this thread for a shorter list, or maybe something like the Top 20 players from last year's + Curry/Durant/Paul/Walton for a larger list)

Or some variant of this methodology, if we wanted to do a more systematic but fair comparison of players across era in this new metric.

...

On another note, I wonder if there are any trends on what the standard deviations are in the ridge values over time.

I checked in the spreadsheet you sent, and it looks like...
-there's a big peak in standard deviation in the 60s
-then it dips until a minor peak around ~90,
-a dip until another major peak around ~06,
-a dip around ~12,
-then the biggest peak is in recent times.
Of course this is only based on the Top 100 players included in the spreadsheet, so the trends across the full player list might be different (e.g. this only includes positive RWOWY players). As for what to interpret from this, I'm not sure. Presumably it relates to the varying penalties for each window? But as for why those trends occur, I'd have to think more. But I'm open to ideas if anyone has any!
Great questions! I think the percentiles are the best for cross-era comparison because the penalties do indeed change with each window based on what is deemed best through cross validation. In general, the penalty that is best is a function of how much variation there is in the scoring margins and how much information there is about the impact of players through a WOWY lens (via injuries, changing teams, rotation size, etc.). Eras with a lot of transactions or a lot more injuries would likely have stronger information signals about players and perhaps require less penalization as a result. For the 60s, I'd guess the overlap with the ABA would be one factor, and for recent times, load management means there are more 'Without' games for a lot of players (though I'd imagine there are more players average 18 MPG per team as a result).
I'd be happy to help calculate these 5 sample, 10 sample, and full-career sample percentile averages for the top players.

Would you be able to either add the player's percentiles to the Google sheet, or if it's easier could you give us a list of the total number of players in each sample?

We can calculate true percentiles in samples that have fewer than 100 players since all players then appear on the Google sheet (e.g. 1952-1956), but when there are samples that may have more than 100 players (e.g. 1964-1968), we don't know how many players there are so we aren't able to calculate the star's true percentile there.


I can definitely look to add percentiles --- I'm looking into the best way to present that. I'll note that the early years don't have fewer than 100 players total --- I only reported those with positive coefficients. There are players with negative coefficients, too.
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,340
And1: 5,102
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#170 » by Moonbeam » Wed Aug 9, 2023 1:53 am

lessthanjake wrote:Would it be possible to see a Ridge chart with Paul Pressey, Magic Johnson, Kareem, and Sidney Moncrief?


Sure!

Image
lessthanjake
Analyst
Posts: 3,472
And1: 3,105
Joined: Apr 13, 2013

Re: Penalized Regression of WOWY data 

Post#171 » by lessthanjake » Wed Aug 9, 2023 2:00 am

Moonbeam wrote:
lessthanjake wrote:Would it be possible to see a Ridge chart with Paul Pressey, Magic Johnson, Kareem, and Sidney Moncrief?


Sure!

Image


Thanks! Any theories on why Pressey does so well (this is a question to anyone)? There’s a four-time-period span (which basically encompasses 1983-1990—i.e. his entire time in Milwaukee) where he basically looks like the best player in the NBA by this metric. Do we think he really was that good? He was a great defender who was also a major playmaker on a great team. So maybe? Or is there something specific going on here that we think is creating this?
OhayoKD wrote:Lebron contributes more to all the phases of play than Messi does. And he is of course a defensive anchor unlike messi.
User avatar
homecourtloss
RealGM
Posts: 11,512
And1: 18,902
Joined: Dec 29, 2012

Re: Penalized Regression of WOWY data 

Post#172 » by homecourtloss » Wed Aug 9, 2023 2:14 am

Moonbeam wrote:Image


Pressey very impressive.

Magic… :lol: What can you say? It would be interesting to see if this was primarily from offense although we have some partial DRAPM numbers that look good for Magic and surprisingly better than Jordan’s:

1985 Magic, +2.01; 1985 Jordan, -.13
1988 Magic, -.16: 1988 Jordan, -.05
1991 Magic, +.43: 1991 Jordan, +.61

We also had your work with DWS deltas that tell us that perhaps his defensive win shares were underestimated (even if DWS aren’t the best measures).

In the early and mid-‘80s, higher ORtgs slightly positively correlated with higher DRtgs, but that changed in the late 1980s and then in the 1990s when higher ORtgs negatively correlated with DRtgs (higher ORtgs genrerally slightly correlated with lower DRtgs).

Image

I wonder if in 1985 not being a two way player didn’t matter as much as far as top level impact is concerned; or because higher ORtgs didn’t necessarily mean lower DRtgs, a true two way player had even more impact and maybe Magic was a plus defender. It’s all very interesting all around.
lessthanjake wrote:Kyrie was extremely impactful without LeBron, and basically had zero impact whatsoever if LeBron was on the court.

lessthanjake wrote: By playing in a way that prevents Kyrie from getting much impact, LeBron ensures that controlling for Kyrie has limited effect…
User avatar
homecourtloss
RealGM
Posts: 11,512
And1: 18,902
Joined: Dec 29, 2012

Re: Penalized Regression of WOWY data 

Post#173 » by homecourtloss » Wed Aug 9, 2023 4:09 am

Moonbeam wrote:
Image


Moonbeam wrote:
Image


Thank you, Moon! That is about as dense a cluster of players in the 90th to 99th percentiles as you are going to get. Lots of grest players (many underrated) here who meshed well and had great coaching.

Can you create a graph for LeBron, Wade, Bosh, Kyrie, KLove, Anthony Davis?
lessthanjake wrote:Kyrie was extremely impactful without LeBron, and basically had zero impact whatsoever if LeBron was on the court.

lessthanjake wrote: By playing in a way that prevents Kyrie from getting much impact, LeBron ensures that controlling for Kyrie has limited effect…
Doctor MJ
Senior Mod
Senior Mod
Posts: 53,674
And1: 22,620
Joined: Mar 10, 2005
Location: Cali
     

Re: Penalized Regression of WOWY data 

Post#174 » by Doctor MJ » Wed Aug 9, 2023 4:40 am

lessthanjake wrote:
Moonbeam wrote:
lessthanjake wrote:Would it be possible to see a Ridge chart with Paul Pressey, Magic Johnson, Kareem, and Sidney Moncrief?


Sure!

Image


Thanks! Any theories on why Pressey does so well (this is a question to anyone)? There’s a four-time-period span (which basically encompasses 1983-1990—i.e. his entire time in Milwaukee) where he basically looks like the best player in the NBA by this metric. Do we think he really was that good? He was a great defender who was also a major playmaker on a great team. So maybe? Or is there something specific going on here that we think is creating this?


This is really interesting because we're not talking about a guy who arrives on a team that spikes, so looks like a true WOWY situation.

Here's the team record with an without him in the Bucks years:

'82-83: 49-30, 2-1 (5-4 in playoffs, all with him)
'83-84: 50-31, 0-1 (8-8 in playoffs, all with him)
'84-85: 58-22, 1-1 (3-5 in playoffs, all with him)
'85-86: 56-24, 1-1 (7-7 in playoffs, all with him)
'86-87: 42-19, 8-13 (6-6 in playoffs, all with him)
'87-88: 40-35, 2-5 (2-3 in playoffs, all with him)
'88-89: 43-24, 6-9 (missed playoffs, 3-6 without him)

'89-90: 30-27, 14-11 (1-3 in playoffs, all with him)

So, those 3 seasons I've bold where he misses significant time and the team falls from above to below .500 without him are surely looming large here.
Getting ready for the RealGM 100 on the PC Board

Come join the WNBA Board if you're a fan!
lessthanjake
Analyst
Posts: 3,472
And1: 3,105
Joined: Apr 13, 2013

Re: Penalized Regression of WOWY data 

Post#175 » by lessthanjake » Wed Aug 9, 2023 5:00 am

Doctor MJ wrote:
lessthanjake wrote:
Moonbeam wrote:
Sure!

Image


Thanks! Any theories on why Pressey does so well (this is a question to anyone)? There’s a four-time-period span (which basically encompasses 1983-1990—i.e. his entire time in Milwaukee) where he basically looks like the best player in the NBA by this metric. Do we think he really was that good? He was a great defender who was also a major playmaker on a great team. So maybe? Or is there something specific going on here that we think is creating this?


This is really interesting because we're not talking about a guy who arrives on a team that spikes, so looks like a true WOWY situation.

Here's the team record with an without him in the Bucks years:

'82-83: 49-30, 2-1 (5-4 in playoffs, all with him)
'83-84: 50-31, 0-1 (8-8 in playoffs, all with him)
'84-85: 58-22, 1-1 (3-5 in playoffs, all with him)
'85-86: 56-24, 1-1 (7-7 in playoffs, all with him)
'86-87: 42-19, 8-13 (6-6 in playoffs, all with him)
'87-88: 40-35, 2-5 (2-3 in playoffs, all with him)
'88-89: 43-24, 6-9 (missed playoffs, 3-6 without him)

'89-90: 30-27, 14-11 (1-3 in playoffs, all with him)

So, those 3 seasons I've bold where he misses significant time and the team falls from above to below .500 without him are surely looming large here.


Interesting. And he does get to that 99th percentile zone (and pass Moncrief) only when we get to a timeframe that has the first one of those years. But he’s already about 95th percentile before that, so that can’t quite be all of it. He didn’t miss much time in those initial few years, so I guess perhaps it must be a result of other players missing games and it not impacting the team much, such that the model thinks Pressey must be a big deal even without missed games from him specifically. Moncrief didn’t miss many games either and looks very similar in those prior couple timeframes, so that probably checks out.
OhayoKD wrote:Lebron contributes more to all the phases of play than Messi does. And he is of course a defensive anchor unlike messi.
ShaqAttac
Rookie
Posts: 1,189
And1: 370
Joined: Oct 18, 2022

Re: Penalized Regression of WOWY data 

Post#176 » by ShaqAttac » Wed Aug 9, 2023 7:45 am

Moonbeam wrote:
homecourtloss wrote:
Moonbeam wrote:Here is a spreadsheet with up to 100 positive coefficients for each 5-year window for Ridge, Lasso, and ENet. I'll see if a spreadsheet with the full data is navigable and post separately if so.


A few requests for Moon if possible:

can you create a graph for Ralph Sampson, Hakeem, Rodney Mcray, and Robert Reid, Ridge and Lasso?

And another for Jordan, Pippen, Horace Grant, and BJ Armstrong, Ridge and Lasso?


Here are the 80s Rockets:

Image

Image

Hakeem dominates as expected.

And the 90s Bulls:

Image

Image

Grant looks pretty great here. He did have the benefit of playing for good teams throughout his career, but the fact he was able to be a positive contributor for all of them (also confirmed via RAPM I believe) is certainly a signal that he is one of the better unheralded guys of the era. B.J. Armstrong looks amazing to start, but I think a lot of that is joining a team that took off and was great. Once the Warriors seasons creep into the sample, he plummets as expected.

doesnt this mean colineary is overrating all of chicagos top players
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,340
And1: 5,102
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#177 » by Moonbeam » Wed Aug 9, 2023 10:35 am

I've been looking into ways to modify the WOWY matrix by including some function of minutes played in a game vs. 1 and -1. One thing that I think makes sense is to adjust the minute profiles in blowouts, as deep bench guys from great teams would have awesome signals this way, as they would tend to exceed whatever minimum minute threshold in a game I set only when their teams were in favorable blowouts, etc. So this has led me down a bit of a rabbit hole in how to detect blowouts from the minute profiles of a game. Meanwhile, the starters on such good teams would tend to play fewer minutes, suggesting that playing them fewer minutes might actually be better for the team. The general idea would be to estimate the amount of "meaningful" minutes each player played by detecting the minutes when the lead was deemed insurmountable and subtracting these minutes from the overall game total (and the minute totals for any deep bench guys).

It turns out it's a bit tricky. Take this game, for example. There's a clear signal of deep bench guys playing 5 minutes and 59 seconds for the Kings, but Portland only played 7 guys the whole game. The Blazers were pretty injured at the time so couldn't play too many guys in any case, but it presents a challenge in determining exactly how to quantify the amount of meaningful minutes in a game. I still have to think about the best approach to it.

I'd looked at the minute profiles of the 10th-15th listed players in box scores through history to see what trends there might be. It turns out, there's a clear signal that 10th-15th players collectively tend to play more minutes now than they did in the past, both in blowouts and in general:

Image

Here are some comparative boxplots for all available seasons for different scoring margins:

Image

Image

Image

Image

Image

Image

Image
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,340
And1: 5,102
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#178 » by Moonbeam » Wed Aug 9, 2023 10:38 am

homecourtloss wrote:
Moonbeam wrote:
Image


Moonbeam wrote:
Image


Thank you, Moon! That is about as dense a cluster of players in the 90th to 99th percentiles as you are going to get. Lots of grest players (many underrated) here who meshed well and had great coaching.

Can you create a graph for LeBron, Wade, Bosh, Kyrie, KLove, Anthony Davis?


Here you go:

Image
User avatar
eminence
RealGM
Posts: 17,118
And1: 11,909
Joined: Mar 07, 2015

Re: Penalized Regression of WOWY data 

Post#179 » by eminence » Wed Aug 9, 2023 11:39 am

ShaqAttac wrote:doesnt this mean colineary is overrating all of chicagos top players


Not what collinearity does. It makes us less sure of our results, but it can't just push the whole variable group up (or down) in the overall model.

In that particular case, I feel pretty confident saying it's likely Armstrong being brought along for the ride, not MJ/Pippen* - their numbers would be somewhat depressed by BJ's impressive result.

*nobody necessarily needs to be 'brought along' either, models can be reasonably accurate (in terms of telling us which variables are having what impact) in spite of collinearity

The overall model accuracy is not hurt by collinearity, though it does make it more likely to overfit your model.
I bought a boat.
User avatar
homecourtloss
RealGM
Posts: 11,512
And1: 18,902
Joined: Dec 29, 2012

Re: Penalized Regression of WOWY data 

Post#180 » by homecourtloss » Wed Aug 9, 2023 1:42 pm

Moonbeam wrote:
Spoiler:
I've been looking into ways to modify the WOWY matrix by including some function of minutes played in a game vs. 1 and -1. One thing that I think makes sense is to adjust the minute profiles in blowouts, as deep bench guys from great teams would have awesome signals this way, as they would tend to exceed whatever minimum minute threshold in a game I set only when their teams were in favorable blowouts, etc. So this has led me down a bit of a rabbit hole in how to detect blowouts from the minute profiles of a game. Meanwhile, the starters on such good teams would tend to play fewer minutes, suggesting that playing them fewer minutes might actually be better for the team. The general idea would be to estimate the amount of "meaningful" minutes each player played by detecting the minutes when the lead was deemed insurmountable and subtracting these minutes from the overall game total (and the minute totals for any deep bench guys).

It turns out it's a bit tricky. Take this game, for example. There's a clear signal of deep bench guys playing 5 minutes and 59 seconds for the Kings, but Portland only played 7 guys the whole game. The Blazers were pretty injured at the time so couldn't play too many guys in any case, but it presents a challenge in determining exactly how to quantify the amount of meaningful minutes in a game. I still have to think about the best approach to it.

I'd looked at the minute profiles of the 10th-15th listed players in box scores through history to see what trends there might be. It turns out, there's a clear signal that 10th-15th players collectively tend to play more minutes now than they did in the past, both in blowouts and in general:

Image

Here are some comparative boxplots for all available seasons for different scoring margins:

Image

Image

Image

Image

Image

Image

Image


Just absolutely fantastic data and depiction here, Moon. Seeing the boxscores of juggernaut teams of the past and seeing how many minutes stars played in these games (whether that’s Russell or Wilt or Bird or Magic, etc.) we knew there was a trend, but it’s quite stark, especially here in the most recent years with load management as well as increased roster size due to Covid necessities.
lessthanjake wrote:Kyrie was extremely impactful without LeBron, and basically had zero impact whatsoever if LeBron was on the court.

lessthanjake wrote: By playing in a way that prevents Kyrie from getting much impact, LeBron ensures that controlling for Kyrie has limited effect…

Return to Player Comparisons