Penalized Regression of WOWY data

Moderators: penbeast0, PaulieWal, Clyde Frazier, Doctor MJ, trex_8063

User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#81 » by Moonbeam » Tue Aug 1, 2023 4:32 am

Doctor MJ wrote:
Moonbeam wrote:STOP THE PRESSES!

I've discovered an error in box scores in the play-by-play era. Essentially, my extraction of box scores was eliminating the home team players from 1997 onward. This meant that the models which include data from 1997 onward are only reflecting the final margin and the players on the away team, without taking into account the players on the home team at all. This obviously will impact the results in a major way.

I am very sorry about this! As such, I would ignore any results posted in this thread that include any seasons from 1997 onward as a result. The results for any 5-year period that doesn't include seasons from 1997 onward should be unaffected.

This also means the analysis comparing these results to the 5-year RAPM is compromised. What an epic fail on my part! :lol: :cry:

I will be re-running the analysis once I rectify the issue with the box scores, but that may take several days.

My apologies to everyone in this thread who has taken some information from post-97 samples and drawn any conclusions from it. I feel pretty awful about this.


Thank you for letting us know. I don't think you have any thing to be sorry about. It's probably for the best we check our excitement a bit.

Looking forward to your fixed batch.

Question: Was then effectively creating an Away-WOWY regression? It's honestly never occurred to me before to consider Home vs Away RAPM or WOWY, and it seems like an idea with potential.


This was kind of creating an Away-WOWY regression (accounting for homecourt advantage with the margins), but it wasn't factoring in the opponents at all, so it doesn't distinguish games played against strong vs weak opponents. Honestly, it is an interesting idea to think about to see if there are notable discrepancies for certain players in home vs away splits. I think the reason I was fooled is that this sort of thing will produce results that are moderately in line with what you might expect taking into account opponent strength. That's why we are seeing the dominant players still show up well in for the post-96 windows I had posted.
User avatar
WestGOAT
Veteran
Posts: 2,591
And1: 3,504
Joined: Dec 20, 2015

Re: Penalized Regression of WOWY data 

Post#82 » by WestGOAT » Tue Aug 1, 2023 4:34 pm

Moonbeam wrote:
WestGOAT wrote:Could be a fun exercise to look at offensive- and defensive-WOWY. So instead of setting the points margin as y, you can set points scored or against (relative to the average for the specific time-window) as Y. Probably will be noisey, but would be interesting to see if the usual suspects that are considered defensive juggernauts would pop up, and especially if bigs being more valuable than smalls is being reproduced.


Indeed, that would be a fun extension! I'm not sure whether I might be limited a bit here by the availability of box score data. I think splitting the impact to offense and defense would only make sense with some sort of pace/possession adjustment, otherwise low pace eras will have their offensive value depressed and their defensive value enhanced, and vice versa for high paced eras. Maybe it's not such a big deal within 5-year windows, though? In any case, I'll have to see if there is enough data in the box scores to approximate possessions for the game. I think there often is, but perhaps not always if we go back to 1952.


I actually have been conincidently looking into predicting possessions at single game level that took place before the 1987 season using basic box-score stats. It's unfortunate that the further back in time, the more games lack possessions, and before 1974, virtually all games are missing possessions at single game-level. So I decided to use box-score data for 1974-1986 supplemented with total season team stats to model possessions:
Image

The standard deviation is around 4.2 possessions, and assuming normal distrubution most predictions should not differ more than 12 possessions with the actual possessions. So not too bad considering nba games averaged a bit more than 100possessions per game iirc back in the 70s/80s.

Let's see if this can be further imrpvoed. Espeicaly for games prior to 1974 I can't be sure how well a model will hold up and it's not really easy/possible to validate.
Image
spotted in Bologna
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#83 » by Moonbeam » Tue Aug 1, 2023 11:20 pm

WestGOAT wrote:
Moonbeam wrote:
WestGOAT wrote:Could be a fun exercise to look at offensive- and defensive-WOWY. So instead of setting the points margin as y, you can set points scored or against (relative to the average for the specific time-window) as Y. Probably will be noisey, but would be interesting to see if the usual suspects that are considered defensive juggernauts would pop up, and especially if bigs being more valuable than smalls is being reproduced.


Indeed, that would be a fun extension! I'm not sure whether I might be limited a bit here by the availability of box score data. I think splitting the impact to offense and defense would only make sense with some sort of pace/possession adjustment, otherwise low pace eras will have their offensive value depressed and their defensive value enhanced, and vice versa for high paced eras. Maybe it's not such a big deal within 5-year windows, though? In any case, I'll have to see if there is enough data in the box scores to approximate possessions for the game. I think there often is, but perhaps not always if we go back to 1952.


I actually have been conincidently looking into predicting possessions at single game level that took place before the 1987 season using basic box-score stats. It's unfortunate that the further back in time, the more games lack possessions, and before 1974, virtually all games are missing possessions at single game-level. So I decided to use box-score data for 1974-1986 supplemented with total season team stats to model possessions:
Image

The standard deviation is around 4.2 possessions, and assuming normal distrubution most predictions should not differ more than 12 possessions with the actual possessions. So not too bad considering nba games averaged a bit more than 100possessions per game iirc back in the 70s/80s.

Let's see if this can be further imrpvoed. Espeicaly for games prior to 1974 I can't be sure how well a model will hold up and it's not really easy/possible to validate.


This is cool! Are you happy to share the details of your model? As I understand it, your response variable is possessions from 1974-1986, and your predictor variables are box score information (team-level I'm assuming?) plus some season level data. Is that correct? What source are you using for the possessions data? As far as I understand it, those possessions for games in that era would be determined completely by whatever available team box score totals exist, no?
User avatar
WestGOAT
Veteran
Posts: 2,591
And1: 3,504
Joined: Dec 20, 2015

Re: Penalized Regression of WOWY data 

Post#84 » by WestGOAT » Wed Aug 2, 2023 9:33 pm

Moonbeam wrote:
WestGOAT wrote:
Moonbeam wrote:This is cool! Are you happy to share the details of your model? As I understand it, your response variable is possessions from 1974-1986, and your predictor variables are box score information (team-level I'm assuming?) plus some season level data. Is that correct? What source are you using for the possessions data? As far as I understand it, those possessions for games in that era would be determined completely by whatever available team box score totals exist, no?


Still early stages, but I'm more than happy to share some details! Here is a link to a sample of the data I'm using:
https://docs.google.com/spreadsheets/d/15tkzunJ4S0t4USqn2I9C4G82m0JNOZBViMBjLrBvc04/edit?usp=sharing

Possessions in a game (Nan_POS) is indeed the y-variable. I used the formula provided by basketball-reference:
https://www.basketball-reference.com/about/glossary.html#:~:text=Poss%20%2D%20Possessions%20(available%20since%20the,FG)%20%2B%20Opp%20TOV)):

Nan_Phase, Nan_Season, Nan_Tm_ID, Nan_Opp are categorical variables that I have one-hot encoded for modelling purposes, so the number of predictor variables is actually longer than the number of columns in the google docsheet.

I collected basic game log data for every team for the seasons of interest from bball-reference:
https://www.basketball-reference.com/teams/PHI/1986/gamelog/
https://www.basketball-reference.com/teams/PHI/1986_games.html (for total minutes played)

And supplemented it with regular season (RS) and playoff (PS) data:
https://www.basketball-reference.com/leagues/NBA_1986.html#totals-team
https://www.basketball-reference.com/leagues/NBA_1986.html#advanced-team

I have not decided on which predictor variables to include in a final model, there is a lot of multicollinearity as expected, and I haven't tried standardized them yet. I also want to continue experimenting with different types of models, but it seems like a simple out of the box multiple linear regressions seems to be performing the best (on the validation dataset) so far :lol:

Image
Image
Image
spotted in Bologna
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#85 » by Moonbeam » Wed Aug 2, 2023 11:16 pm

WestGOAT wrote:
Moonbeam wrote:
WestGOAT wrote:


Still early stages, but I'm more than happy to share some details! Here is a link to a sample of the data I'm using:
https://docs.google.com/spreadsheets/d/15tkzunJ4S0t4USqn2I9C4G82m0JNOZBViMBjLrBvc04/edit?usp=sharing

Possessions in a game (Nan_POS) is indeed the y-variable. I used the formula provided by basketball-reference:
https://www.basketball-reference.com/about/glossary.html#:~:text=Poss%20%2D%20Possessions%20(available%20since%20the,FG)%20%2B%20Opp%20TOV)):

Nan_Phase, Nan_Season, Nan_Tm_ID, Nan_Opp are categorical variables that I have one-hot encoded for modelling purposes, so the number of predictor variables is actually longer than the number of columns in the google docsheet.

I collected basic game log data for every team for the seasons of interest from bball-reference:
https://www.basketball-reference.com/teams/PHI/1986/gamelog/
https://www.basketball-reference.com/teams/PHI/1986_games.html (for total minutes played)

And supplemented it with regular season (RS) and playoff (PS) data:
https://www.basketball-reference.com/leagues/NBA_1986.html#totals-team
https://www.basketball-reference.com/leagues/NBA_1986.html#advanced-team

I have not decided on which predictor variables to include in a final model, there is a lot of multicollinearity as expected, and I haven't tried standardized them yet. I also want to continue experimenting with different types of models, but it seems like a simple out of the box multiple linear regressions seems to be performing the best (on the validation dataset) so far :lol:

Image
Image


Thank you for those details! It is interesting that multiple linear regression is producing the best results so far. This probably deserves its own thread! If I do look to separate Offensive and Defensive versions of these RWOWY ratings, I would be keen to incorporate the estimates from your best models.

Perhaps someone like 70sFan or Squared2020, who have logged a lot of historical game data, might have tracked possessions as well that could be used for testing?
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#86 » by Moonbeam » Wed Aug 2, 2023 11:29 pm

Nearly time to restart the presses.

My code has updated through the 2017-21 window, the last 5-year window in Cheema's prior-informed 5-year RAPM. The different correlations I posted have all seen a modest bump up:

All data: 0.346 -> 0.370
Played at least 5000 possessions: 0.411 -> 0.452
Won at least one award (All-Star, All-League, All-D) during the 5 years: 0.464 -> 0.558

I'm going to edit the document with these new comparisons and re-post in the OP (keeping version 1.0 for posterity as a separate link). The code for 2018-22 and 2019-23 should be ready within an hour, so I'm happy to field more queries! I'll re-post some comparisons that included post-96 data as well.
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#87 » by Moonbeam » Thu Aug 3, 2023 11:32 am

I've finally updated the document in the OP. I've re-written the section with the RAPM comparison, added another paragraph to the extensions section, and added lots more player comparison graphs.

The paragraph regarding extensions feels quite obvious in hindsight. Rather than use a strict 18 MPG threshold for player inclusion, perhaps something like 18 minutes in a game would work. That would capture the relative absence of starters who get injured early in a game as well as players whose roles (and minutes) change throughout the season. There'd still probably need to be some adjustment for blowouts, but it's worth considering.

The original version of the document is in the Version History in the OP.
User avatar
AEnigma
Assistant Coach
Posts: 4,048
And1: 5,854
Joined: Jul 24, 2022
 

Re: Penalized Regression of WOWY data 

Post#88 » by AEnigma » Thu Aug 3, 2023 1:01 pm

Love to see all the new graphs!

Initial reactions (with an eye to the top 100 project):

    - Bob Davies fans betrayed
    - Prime Sharman somewhat affirmed, but for top 100 purposes the question continues to be about his prime length. Would have a stronger case with separation over Cousy.
    - Interesting Arizin results
    - Mikan affirmed
    - Neither Greer nor Jones looking like a true standout, but Jones has his playoff elevation.
    - Schayes maybe not as outstanding as we would want, but…
    - Pettit even less so
    - You already showed Wilt in this thread, but will emphasise it is neat how this goes against some of the common narratives about his “impact”
    - Wilkens looking worse than I may have thought
    - Happy to say Beaty affirmed
    - I myself have noticed Frazier’s more paltry late career WOWY indicators; makes for a tough assessment when his box production did not actually drop off too much.
    - Decent support for Dandridge, although the drop-off replacing 1974 with 1979 (his box score peak) is odd.
    - Billy Knight surprising
    - McAdoo underwhelming (never been a big fan of his archetype), but the other five 1970s centres look like confident top 50 contenders (or in Walton’s case, would-be).
    - How dare you excise Gus Williams :cry:
    - Interesting Reggie Miller results, if not too affecting for the most substantial playoff riser among that group

Still hope to see a graph with Penny, Grant Hill, Iverson, Strickland, and Billups, whenever you have the time. You showed me four of these names and I do not care that much about Stickland lol, so disregard.
MyUniBroDavis wrote:Some people are clearly far too overreliant on data without context and look at good all in one or impact numbers and get wowed by that rather than looking at how a roster is actually built around a player
User avatar
eminence
RealGM
Posts: 16,713
And1: 11,553
Joined: Mar 07, 2015

Re: Penalized Regression of WOWY data 

Post#89 » by eminence » Thu Aug 3, 2023 2:18 pm

AEnigma wrote:Love to see all the new graphs!

Initial reactions (with an eye to the top 100 project):

[list]- Bob Davies fans betrayed
- Schayes maybe not as outstanding as we would want, but…


Hey, I'm doing fine. I'd be interested to see a graph with all the key Royals from the early 50s.

I like what I see from the Schayes graph, would've liked a bit higher peak, but that's good results for a very long time.
I bought a boat.
Doctor MJ
Senior Mod
Senior Mod
Posts: 52,762
And1: 21,690
Joined: Mar 10, 2005
Location: Cali
     

Re: Penalized Regression of WOWY data 

Post#90 » by Doctor MJ » Thu Aug 3, 2023 5:52 pm

I'm going through this just like AEnigma did, so I think I'll just interlace my comments with his (with my own thoughts in blue)

AEnigma wrote:Love to see all the new graphs!
Initial reactions (with an eye to the top 100 project):



Bob Davies fans betrayed

I wouldn't quite say that. I think we have to remember that Davies retired in 1955 as a 35 year old so there isn't any period on the chart where he's actually playing the whole time, and he's never in prime in the periods in question. It's really the years 1945-46 to 1950-51 where the Rochester Royals make their mark, and Davies' heyday is in the '40s.

On the same graph:

Slater Martin looks exceptional. Notice he's literally ahead of Cousy basically their entire career.

Andy Phillip also looks strong.

I have to say I'm a little weirded out by Cousy looking so strong at the end of his Celtics run given that they improved the year after he retired. I feel like in general the WOWY metrics are at their best with sustained runs where the player in question and all major teammates are all playing throughout it. When guys retire or switch teams, I'm sure there are times when that's quite telling, but other times it kind feels to me like there's a kind of inertia going on.


- Prime Sharman somewhat affirmed, but for top 100 purposes the question continues to be about his prime length. Would have a stronger case with separation over Cousy.

I'd say I agree here. To me it's always been a question of whether Sharman actually deserves to be ahead of Cousy, but I've never been able to pull that trigger, and this data certainly isn't going to help with that.

Paul Seymour is a guy I've championed - though not realistically as a Top 100 guy - so I should speak to that here. Note that Seymour takes over as coach in '56-57 and effectively demotes himself. While it's possible that his +/- was excellent in this time period, even if he was, in such limited minutes, I'd be surprised if he made much of a mark after that.


- Interesting Arizin results

Indeed. I have to acknowledge some surprise. Given the way the team fell of a cliff without him after his initial run and then rose back up to record the best offense of the '50s while winning the title, I'd have expected he look better by a WOWY measure. What I will say though is that I feel like the multi-year way that Moonbeam is doing this, while entirely reasonable to reduce noise, probably doesn't show Arizin's impact as well as a more granular measure.

I also think that frankly Wilt Chamberlain's arrival messes a lot up for his early teammates in these measures. With Arizin really never seeing his ability to volume score with above league average efficiency cease, I think in another situation his WOWY metrics would look more graceful as he aged.

Pollard looks excellent. Not all that surprised.

Yardley looks excellent in the later years, which makes sense statistically, but it's interesting because his deep playoff runs are basically entirely separate from his best WOWY scores.


- Schayes maybe not as outstanding as we would want

I actually think Schayes looks quite good, it's just that like Yardley, he peaks after the team peaks. So long as we refrain from adopting a rings narrative with him, I think we're good. In between Mikan & Russell, I'd say Schayes career stands above everyone else in that period.

Mikkelsen is eye-opening. After seeing that Pollard & Martin looked so good, I was wondering about Mikkelsen. Had he been outstanding, I think we'd need to worry about the Laker dominance just making all the guys look amazing. Turns out, not so much.



- Mikan affirmed


So, in general, Mikan looking so good makes sense, but I have to say it's hard for me to understand why we don't see more of a dip in the end. After MIkan retires in 1954, the team is still quite good the next year, and when Mikan returns the next year - while they are better with him than without him - they don't really seem like they are better than the previous year. I'd have thought we'd at least seem him fall to something other than the 90+ percentile.

The end numbers of Neil Johnston and especially Larry Foust are part of what I'd point to to urge caution. I don't believe that Foust was a below average player through his prime and became elite as an old man, for example.



I'll jump in h ere to just mention Larry Costello as a guy who was an excellent player. Wouldn't expect to champion him in the Top 100, but I don't think he's far away from that.


- Neither Greer nor Jones looking like a true standout, but Jones has his playoff elevation.

So in this graph it's Tom Gola who stands out to me. A college mega-star who was particularly noteworthy in how much he hated playing with Wilt Chamberlain - at least the way the scheme was implemented. He clearly felt like the result that was something that took away what was special about him, and I think the data here seems to agree with him. Once again I'm left thinking that the team Wilt joined actually had solid pieces that a coach like Alex Hannum probably could have done something great with, given that he had spikes of success with Wilt in his later stints with him.

I too am surprised that both Greer and Jones look so close to average most of their years.


My first thought is that I'm surprised Hagan doesn't look better, but looking more closely, and looking at the next graph, Hagan clearly has a pretty good run - better than Pettit during the championship time period. I think what surprises me is that I know that in the ABA Hagan looked pretty damn good in the first year before tailing off. That really gives me the impression that he probably could have been doing more for the Hawks in the NBA in the mid-60s, but of course, there's no reason that this metric would infer this.

I'm really not comfortable with Baylor's graph here. By this measure Baylor really figures out how to be impactful after he's injured and carrying on as a first option with a better first option in Jerry West being suppressed. I think we're getting some confounding variables here.

Surprised that Twyman is so, so low. I think that's not really fair to the guy.



- Pettit even less so


Yeah, Pettit looks like a good player, but really not that amazing.

Heinsohn is the stud by this graph, and that's something I'm going to have to chew on. Feel like I should ask for a Celtics' only graph from the '50s & '60s. It's well within the realm of possibility that Heinsohn made more of a WOWY difference because of the Celtics lack of other options, but that Pettit would have been considerably stronger by the same metric were he on those Celtics.


- You already showed Wilt in this thread, but will emphasise it is neat how this goes against some of the common narratives about his “impact”


This is all true, though I do think that the story would look different with more granular WOWY. Hard to imagine a metric that was specifically focused just on the years '67-68 & '68-69 would show Wilt to be anything like elite. Again, Moonbeam focusing on larger samples makes sense for noise reduction, but from a perspective of "I guess all those criticisms of Wilt amounted to nothing", I would object on the grounds that the metric is allowing Wilt to look more consistent in his WOWY impact than he literally was.

Kerr looks like another guy with iffy high numbers at the end of his career.


- Wilkens looking worse than I may have thought


Yeah, his post-Hawk numbers just aren't the same here.

And for completeness I'll just note that we have Oscar & Wilt on the same graph, and both look outstanding.



I'm disappointed that Dick Van Arsdale doesn't look better. Everything I've seen of him seems to indicate he'd be an extremely valuable role player.

Telling that Goodrich has a spike of value around the time of the Laker chip and that's it. I think Goodrich legit was a critical part of what made that team amazing, but I think in general he was just a gunner even when that wasn't what the team needed.

Interesting that Jeff Mullins seems to look like the best of the trio. Not a guy I've thought a lot about.



The next graph is interesting. Billy Cunningham comes off as a legit star for a sustained period of time.

Tom Van Arsdale, Dick's twin, comes off awful, and I'm not surprised. I've not read about their youth but it seems like they adopted a playing style together where Tom was the scorer and Dick was the guy filling in the gaps. In the pros, that made Tom not good enough at the only thing he was good at, whereas Dick scaled better. I find this twin-as-AB-experiments to be very interesting whenever we see them. No guarantee they had literally the same talent, but probably Tom could have learned to play more like Dick from a young age.



Next graph has Dave DeBusschere as the most consistently impressive of the lot, with weird graphs for the other guys especially Jerry Lucas. As I've said, guys spiking their WOWY at the end of their career raise my eyebrows.



- Happy to say Beaty affirmed

Indeed, interesting that both he and Nate Thurmond seem to have the general edge over Willis Reed.

Next couple graphs are ABA, and I'd say Jimmy Jones impresses the most, though Donnie Freeman seems remarkably consistent.

We finally get Connie Hawkins...and honestly I'm not surprised that he falls off quickly, but I'm glad to at least see he starts off strong.

In the graph, I'd say it's a pretty tight race but that John Havlicek has a slight edge over Rick Barry, with Chet Walker as something of a wild card. Glad to see Walker come off looking impressive.


Next graph is not very exciting. Bob Netolicky's early years seem like something I need to look more closely at.

Interesting the way Wes Unseld and Mel Daniels parallel each other before Daniels falls off first. Makes sense.

- I myself have noticed Frazier’s more paltry late career WOWY indicators; makes for a tough assessment when his box production did not actually drop off too much.

His prime impact certainly looks legit though, we're not seeing numbers like this from Reed or DeBusschere.

Dave Bing looks awful, and I'm afraid I'll never be able to think about him without thinking about the fact that he made the NBA's Top 75. Nothing personal against the guy, but it's just indefensible on any actual basketball level.


Earl Monroe comes off looking more consistently solid than I'd have expect.

Warren Jabali starts off strong before falling off, which is of course the story of his drug-addled career.


- Decent support for Dandridge, although the drop-off replacing 1974 with 1979 (his box score peak) is odd.

- Billy Knight surprising

I'm surprised too, but I think what's going on here is that I'm very impressed by Dandridge's role in making the Bullets more matchup-resilient in the playoffs, but to the WOWY metric, the Bullets look like they were already quite good. Good to see him scoring highly at least for those earlier year.

Yikes Knight looks bad.

Erving as I said before: More consistently excellent than I'd expected.



The next graph is kind of a trainwreck. I've long been skeptical about Elvin Hayes, and I'd say this graph makes it look justified. Haywood and McGinnis also look like guys with a lot of blah years.

Dan Issel looks like the best of the bunch but even with him, I was expecting his calling card to be consistency through longevity, and that's not what we're seeing.

Poor Sidney Wicks looking like the opposite of John Wick.



- McAdoo underwhelming (never been a big fan of his archetype), but the other five 1970s centres look like confident top 50 contenders (or in Walton’s case, would-be).


So to me this is a graph to really focus on. Kareem looks great by all reasonable standards...but also looks more like what I see his impact compared to the guys who seem to surf the 100th percentile like Russell, Wilt...and Walton along certain stints.

Bob Lanier looks really stellar, he's definitely going to be someone I have to think more about.

Cowens looks excellent as well.

Gilmore has the issues that I'd expect. I'm not particularly disappointed, but I will be considering whether Cowens & Lanier should rank ahead of him.

Yeah, McAdoo, I'd have expected better.[/color]


- How dare you excise Gus Williams :cry:


I'm not a Gus Williams true believer, but would be interested in seeing him - and would be cool to see DJ & Sikma in the same graph.


I'm really surprised at how meh Gervin looks.

I think people in the Top 100 who are so skeptical of Bird need to see this data. He's absolutely among the guys who seem to surf the 100th percentile. Any guy who can do that is showing signs of being an outlier.

Interesting seeing how good Walter Davis looks early on. Another guy whose career gets messed up with drug issues.

Marques is another of those guys, and he looks good up front, but I'd have thought he'd look even better.

Bernard King looks absolutely legit for that brief window.

Dantley & English look iffy.


Bobby Jones looks quite good, particularly when we consider that limited minutes should be hurting him here.

Dan Roundfield and Larry Kenon looks quite good too.

Mo Lucas, a bit soft.



So, Robert Parish looks better than Moses Malone. That's really something. I'm not ready to argue for Parish before Moses, but it's certainly going to help me champion Parish later on if the opportunity avails.

Jack Sikma also looks awesome, with Bill Laimbeer not looking bad either, but I'd have thought Laimbeer would look better than Sikma.


Magic Johnson a clear cut 100th percentile surfer. Just incredible.

Isiah looking good but not like a guy who was a worthy rival to Magic or Bird.

Kinda expected Mo Cheeks to look stronger here. He doesn't look bad, but he also doesn't seem like someone I should be championing in the 100.


Dennis Johnson looks really awesome. Sidney Moncrief looks almost as awesome, but I was higher on Moncrief and had more questions about DJ.

Something that should be noted about DJ's performance here: The way he jumped from Seattle to Phoenix to Boston on great teams the whole time may be helping him more than it should. Obviously, he deserves credit for being a vital part of all these good teams - and we shouldn't forget that - but I could see the algorithm assuming DJ was carrying more impact with him than was actually so.

David Thompson absolutely looks like a legend early on before falling off, which seems right.


Feel like I already said this before but in the next graph, the way Worthy & Wilkins seems to change places is really funny. I'm inclined to say that the way Wilkins spikes late in his run is a bit suspicious of the algorithm whereas the way Worthy falls off feels more like a concern for Worthy post-Magic.

And yeah, Mullin seems like the guy whose niche is just plain valuable everywhere and he could do it for a long time.



In the next graph, to me Larry Nance ends up looking better than everyone else, and that's so interesting because Kevin McHale is in the graph and Bird & Parish look so impressive. Though as I look back up the graphs, the data doesn't really seem to be saying Parish was more valuable in his prime so much as that he and McHale were similarly valuable in prime, but that Parish has the longevity edge.

Love seeing Buck Williams here. Always a guy I'm considering championing. Not sure if I will this time.


I think the 90s bigs is the same as before: Similar peaks for all 4 guys with David Robinson being the most consistent of the bunch. Makes sense, and if the regular season were the only season, he'd be above Olajuwon on my list. As is, Olajuwon > Robinson > Ewing >> Daugherty remains my assessment.


With the '90s point guards, we get very close primes with something of a gap between the rest and Tim Hardaway. Stockton definitely takes the prize with his sustained excellence, and Payton I think comes in next, though he didn't maintain his excellence the same way. I do think KJ looks quite good with him lasting longer than we generally think.


- Interesting Reggie Miller results, if not too affecting for the most substantial playoff riser among that group


I have to say I'm really surprised Miller doesn't look stronger here. Having a weaker prime showing that Jordan & Drexler isn't really surprising, but the way Dumars maintains that lead over Reggie surprises me. As you say, Playoff Reggie is absolutely a thing, but I'm going to want to chew on this more. Miller is a guy I've championed as a matter of course for a long time on our projects and I probably will this time too because I'm just higher on him than most, but I thought he'd look stronger here.

Alvin Robertson really looks awful and he's not making those DPOY voters look like they knew what they were doing at all.

'90s small forwards: Pippen looks very solid - though Majerle and Schrempf look pretty solid too.

'90s power forwards: Yeah this thing where Barkley looks consistently better than Malone for most of their career surprises me.

'90s-00s center: Shaq's a 100th percentile surfer. Zo looks superb and considerably better than fellow Hoya Deke.
Getting ready for the RealGM 100 on the PC Board

Come join the WNBA Board if you're a fan!
Doctor MJ
Senior Mod
Senior Mod
Posts: 52,762
And1: 21,690
Joined: Mar 10, 2005
Location: Cali
     

Re: Penalized Regression of WOWY data 

Post#91 » by Doctor MJ » Thu Aug 3, 2023 5:57 pm

So, just wanted to have a post specifically for the 100th percentile surfers. Basically guys who regularly hit that 100th percentile in sustained runs in the 90s and above.

George Mikan
Bill Russell
Wilt Chamberlain
Oscar Robertson
Jerry West
Bill Walton
Larry Bird
Magic Johnson
Michael Jordan
Shaquille O'Neal

Honestly, seems about right. Curious who else is like that when we see more graphs.
Getting ready for the RealGM 100 on the PC Board

Come join the WNBA Board if you're a fan!
User avatar
eminence
RealGM
Posts: 16,713
And1: 11,553
Joined: Mar 07, 2015

Re: Penalized Regression of WOWY data 

Post#92 » by eminence » Thu Aug 3, 2023 6:38 pm

Doctor MJ wrote:So, just wanted to have a post specifically for the 100th percentile surfers. Basically guys who regularly hit that 100th percentile in sustained runs in the 90s and above.

George Mikan
Bill Russell
Wilt Chamberlain
Oscar Robertson
Jerry West
Bill Walton
Larry Bird
Magic Johnson
Michael Jordan
Shaquille O'Neal

Honestly, seems about right. Curious who else is like that when we see more graphs.


My guesses would be Duncan/KG/Dirk/LeBron/CP3/Steph based of the more granular stuff, but who knows.

I would enjoy having some of this stuff in a spreadsheet/table to browse for sure.
I bought a boat.
Doctor MJ
Senior Mod
Senior Mod
Posts: 52,762
And1: 21,690
Joined: Mar 10, 2005
Location: Cali
     

Re: Penalized Regression of WOWY data 

Post#93 » by Doctor MJ » Thu Aug 3, 2023 6:50 pm

eminence wrote:
Doctor MJ wrote:So, just wanted to have a post specifically for the 100th percentile surfers. Basically guys who regularly hit that 100th percentile in sustained runs in the 90s and above.

George Mikan
Bill Russell
Wilt Chamberlain
Oscar Robertson
Jerry West
Bill Walton
Larry Bird
Magic Johnson
Michael Jordan
Shaquille O'Neal

Honestly, seems about right. Curious who else is like that when we see more graphs.


My guesses would be Duncan/KG/Dirk/LeBron/CP3/Steph based of the more granular stuff, but who knows.

I would enjoy having some of this stuff in a spreadsheet/table to browse for sure.


Oh definitely, but I love the graphs Moonbeam is doing. I can process things considerably faster when they are so visual.

Feel like saying:

Identifying this tier of player dominance feels like a really big deal to me given the skepticism people are expressing about Bird. When I'm doing my lists, it's the ultra outliers I want to consider first. Now that doesn't mean I'm super high on Mikan compared to others because league difficulty is a sperate thing, but among contemporaries it's a critical starting point.

As I say this, the fact this is all regular season data is a thing to be cautious with. This weekend I want to think more about Dirk because I think he might be in this tier for the regular season...but he does have some significant playoff impact issues for a good chunk of his career, and since Dirk is someone I might be looking to nominate after Oscar so I should make sure I feel I'm properly factoring that in before I do so.

Of course Dirk's not the only one who may have these issues. Bird has been talked about as someone with issues along these lines.

Regardless, I do think the onus is finding arguments for the non-100th-percentile guys over the 100th-percentile guys.
Getting ready for the RealGM 100 on the PC Board

Come join the WNBA Board if you're a fan!
Doctor MJ
Senior Mod
Senior Mod
Posts: 52,762
And1: 21,690
Joined: Mar 10, 2005
Location: Cali
     

Re: Penalized Regression of WOWY data 

Post#94 » by Doctor MJ » Thu Aug 3, 2023 7:18 pm

A few quick thoughts:

First, I mentioned in my long post about concerns about guys who switch between good teams potentially getting too much credit. I think it's worth considering what players have a DJ-like contender-to-contender run and seeing how many guys come out looking as good as DJ and guys who don't.

Second, I notice that all of this is percentile-based. So long as we're talking about contemporaries that's really perfect, but when we consider between eras, I think we should be actively considering if there's a massive difference between eras, and in particular, if there are any massive jumps between different years. Moonbeam's 5-year approach will smooth this out, and that's not a bad thing in and of itself, but still, if we're seeing similar percentile levels in different eras that represent, say, more than double the scoreboard-impact compared to each other, I think it would influence our interpretation.

Third, at this point I'm an addict just waiting for my next hit and am just expecting to see stuff from all the big guys in the 21st century when Moonbeam has it ready, and I hope Moonbeam can continue to see this as a fun thing, but we're all putting work on a busy man. So Moonbeam, just let us know if we're driving you nuts.

I think ideally what I'd really love is the ability for us to do queries as we go along a project like the Top 100. "These are the guys we're discussing right now, and this is how this data looks". If this is something Moonbeam is cool to do, I'm happy to keep asking pretty please, but it probably behooves the rest of us to consider learning a thing.

Beyond that, I think what I'd be most interested in seeing are particular cores presented together. To just name some going through history. (Please don't feel compelled to do all or most of these, I'd expect that the dynasties would be most interesting.)

- Rochester Royals if we can get good numbers at least back to their joining of the BAA. (NBL back to '45-46 would be amazing, but the data is super sparse)
Key players: Bob Davies, Arnie Risen, Bobby Wanzer, Jack Coleman, Arnie Johnson

- Minneapolis Lakers ideally back to their joining of the BAA.
Key players: George Mikan, Jim Pollard, Herm Schaeffer, Slater Martin, Vern Mikkelsen, Clyde Lovellette

- Syracuse Nationals
Key players: Dolph Schayes, Paul Seymour, Red Rocha, Earl Lloyd, George King, Red Kerr

- Philadelphia Warriors
Key players: Paul Arizin, Neil Johnston, Jack George, Tom Gola, Wilt Chamberlain

- Boston Celtics
Key players: Bob Cousy, Ed Macauley, Bill Sharman, Bill Russell, Tom Heinsohn, Frank Ramsey

- Boston Celtics
Key players: Bill Russell, Sam Jones, John Havlicek, KC Jones, Tom Sanders, Bailey Howell

- Boston Celtics
Key players: John Havlicek, Dave Cowens, Jo Jo White, Paul Silas, Don Chaney, Don Nelson

- St. Louis Hawks
Key players: Bob Pettit, Cliff Hagan, Lenny Wilkens, Clyde Lovellette, Zelmo Beaty, Lou Hudson

- Philadelphia 76ers
Key players: Wilt Chamberlain, Hal Greer, Chet Walker, Billy Cunningham, Luke Jackson, Wali Jones

- Los Angeles Lakers
Key players: Elgin Baylor, Jerry West, Dick Barnett, Rudy LaRusso, Wilt Chamberlain, Gail Goodrich

- New York Knicks
Key players: Walt Frazier, Willis Reed, Dave DeBusschere, Dick Barnett, Earl Monroe, Bill Bradley

- Milwaukee Bucks
Key players: Kareem Abdul-Jabbar, Oscar Robertson, Bob Dandridge, Jon McGlocklin, Greg Smith

Okay I'm going to stop there because this is already beyond ridiculous and I think it's clear how I'm thinking. By all means if it makes sense, we could just focus on dynasties up to the present.

Thanks again Moonbeam!
Getting ready for the RealGM 100 on the PC Board

Come join the WNBA Board if you're a fan!
ShaqAttac
Rookie
Posts: 1,175
And1: 362
Joined: Oct 18, 2022
 

Re: Penalized Regression of WOWY data 

Post#95 » by ShaqAttac » Fri Aug 4, 2023 12:49 am

Doctor MJ wrote:So, just wanted to have a post specifically for the 100th percentile surfers. Basically guys who regularly hit that 100th percentile in sustained runs in the 90s and above.

George Mikan
Bill Russell
Wilt Chamberlain
Oscar Robertson
Jerry West
Bill Walton
Larry Bird
Magic Johnson
Michael Jordan
Shaquille O'Neal

Honestly, seems about right. Curious who else is like that when we see more graphs.

isnt mj only going top in expansion?

he was lower than hakeem, bird n magic earlier
User avatar
AEnigma
Assistant Coach
Posts: 4,048
And1: 5,854
Joined: Jul 24, 2022
 

Re: Penalized Regression of WOWY data 

Post#96 » by AEnigma » Fri Aug 4, 2023 12:59 am

That still counts lol. It is just one of those things where people should be careful not to track backward — “oh wow imagine what his ‘impact’ must have been at his physical peak!” Not that different from people looking at 1971/72 Kareem and then extrapolating to 1977 Kareem. It is why we still need to look at the players themselves rather than just whether they have some massive impact indicator outside of their precise peak range.

It is to Jordan’s credit that at 33-35 he was the best player in the world, just as it was for Lebron, and just as it was for Russell, and just as it theoretically could have been for Magic. And we can fairly analyse the competition at play for those years — Karl Malone and pre-peak Shaq maybe underwhelms compared to Jerry West and Willis Reed and a down-year Wilt — but it does not change that he was able to hold that rarefied status when most players are firmly post-prime. Those are the years which enshrined him, whether they be at #1 or #4 or #6.
MyUniBroDavis wrote:Some people are clearly far too overreliant on data without context and look at good all in one or impact numbers and get wowed by that rather than looking at how a roster is actually built around a player
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#97 » by Moonbeam » Fri Aug 4, 2023 1:04 am

Moonbeam wrote:
OhayoKD wrote:
Moonbeam wrote:In the Top 100 project, I made a post about some estimates I have calculated via penalized regression of WOWY data, which I’ve called RWOWY, RWOWY-Ridge, RWOWY-Lasso, and RWOWY-ENet: viewtopic.php?p=107785464#p107785464

A form of penalized regression is used in calculating RAPM, so the metric RWOWY-Ridge is analogous to it, except it is applied to WOWY data instead of +/- data.

Some posters were interested in the data, so I’ve put together a document explaining the methods here. This document walks through an example of calculating these estimates for a 5-year window from 1982-86 and provides a critical evaluation of the results including a comparison to 5-year RAPM, some ideas for possible extensions, and some graphs with player comparisons.

A few quick takeaways:

It’s challenging to determine which players to include in the sample due to the nature of the data. Including all players would likely make deep bench guys on good teams appear to be the most impactful players as they might tend to only play in blowouts their team won. Setting some minimum MPG threshold is one way to try to counter this, but it gives rise to other anomalies.

RWOWY-Ridge is modestly positively correlated with RAPM data for equivalent 5-year periods. The correlation is about 0.41 on average with players who played at least 5000 possessions over a 5-year period (roughly 92% of players who played at least 18 MPG in one season), but this correlation increases a little bit when looking at players with more consistent minute profiles and those who won league awards.

I’m still in the process of obtaining box scores, so I don’t have estimates for the entire history of the league yet, but I imagine I will in the next few weeks. I’d be happy to collate and share this data if there is interest.

I’m happy to take any feedback you may have or ideas for modifications I haven’t considered.

would it be greedy of me to ask for a graph charting magic, bird, hakeem, and jordan specifically?


No problem! Happy to add more graphs if you (or anyone else) would like.

Image


Here's an update with the corrected post-96 windows.

Image
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#98 » by Moonbeam » Fri Aug 4, 2023 1:08 am

Moonbeam wrote:
Doctor MJ wrote:
OhayoKD wrote:Uh:
Image
I don't see how anyone but Magic looks like an impact outlier here. Magic's career-splits are also outright matched by Duncan who looks better than Jordan where this graph would suggest his value peaked in various metrics like AUPM, WOWY, RAPM(Cheema, JE) despite playing way more minutes than any of his teammates(and as a result having to play with sub-standard teammates).

And then we have data ball where it is two two-way bigs and a guy who combines goat-tier offense with the ability to carry -5.5 defenses

I think what impact data bears out is that scoring and one-way offense is overrated if anything.


Really not sure how you can say "Magic looks like an impact outlier" and "data bears out one-way offense is overrated". We can have conversations about individuals certainly, but the idea that, say - among contemporaries, Olajuwon should rank ahead of Magic, is not helped by this data.

And while Bird doesn't look as strong as Magic, he looks pretty damn good too.

I think it would be cool to see more graphs along these lines for guys in more recent eras. That can obviously include the more fine-detailed stuff we get with legit +/- data, but apples-to-apples analyses always provide their own insight. I'd like to see how the Nashes and Kidds compare with the Stocktons and Prices, for example.


Here are those 4 PGs:

Image


Here is the updated version with fixed post-96 data:

Image
User avatar
Moonbeam
Forum Mod - Blazers
Forum Mod - Blazers
Posts: 10,205
And1: 5,059
Joined: Feb 21, 2009
Location: Sydney, Australia
     

Re: Penalized Regression of WOWY data 

Post#99 » by Moonbeam » Fri Aug 4, 2023 1:10 am

Moonbeam wrote:Here are the MVP winners of the 2010s, minus Giannis (I have to tweak the code to add more than 6 players). The data runs through 2022 as I still have to get box scores for 2023.

Image


Fixed version:

Image
User avatar
AEnigma
Assistant Coach
Posts: 4,048
And1: 5,854
Joined: Jul 24, 2022
 

Re: Penalized Regression of WOWY data 

Post#100 » by AEnigma » Fri Aug 4, 2023 1:12 am

Moonbeam wrote:
Doctor MJ wrote:I'd like to see how the Nashes and Kidds compare with the Stocktons and Prices, for example.

Here is the updated version with fixed post-96 data:

Image

Is Kidd the greatest home-court point guard in NBA history? People are asking.
MyUniBroDavis wrote:Some people are clearly far too overreliant on data without context and look at good all in one or impact numbers and get wowed by that rather than looking at how a roster is actually built around a player

Return to Player Comparisons