Page 1 of 1

Model Quantifying Top 100

Posted: Tue Jul 9, 2024 9:43 pm
by Smoothbutta
Introduction:

Goal was to quantify careers using a formula that combines accolades with simple advanced stats while compensating for era, and benchmarking + adjusting the weights of the formula against approximate expected rankings using least squares regression. Any missing accolades from earlier eras are retroactively assigned (9 DPOYs for Bill, 1 FMVP for Paul Arizin, etc.)

So by a LOT of trial and error, the resulting formula tells us how the average NBA geek weighs the achievements of these players in player rankings. Imagine drawing a line of best fit equation through all the top players' achievements, refining that line/equation, and then plugging in each player and showing where each player falls on that prediction model.

It can always be adjusted/optimized and it certainly is less accurate for certain players over others since this is just a rough model for something that is not even objective, but outliers exist in all lists and I'm happy with the results of this overall.

Results here
Spoiler:
Image

Any value that was retroactively adjusted or made is italicized.

Caveats:

It is not perfect even as an approximation model. Oscar Robertson is not approximately where he should (or is typically) ranked at all unfortunately. Havlicek/Dwight/GP are higher than normal, Ewing is very low and a couple others like Nash are a bit low but as a whole I believe it's an interesting result that is not too biased. And some of the outliers I believe could give some indication of perception skew, or contextual/legacy absence in modeling, etc.

As alluded to above, the model obviously doesn't know any legacy or contextual factors. If you think Steph gets bonus points for being the best shooter of all-time, you can take his ranking in this model with a grain of salt or if Ewing would have way more All-NBAs if it weren't for the generational centers overlapping with his prime. Same with if you think X player should get a lower ranking for one playoff run or some other reason, those are outside the scope of this model but would certainly play a part in typical ranking. And ofc every player has their own contextual factors and none of this is truly objective anyway.

There is better data that could be used. You could use impact metrics like On-Off or EPM, other advanced stats, etc. but at best-case these only exist post-1997 so it's only possible to use that data to compensate modern players. However I thought that to be outside the scope of this project. All the data used for this model is on BBR (or mostly on BBR with some retroactive assignments).

Not all players in history were ranked, it's possible that some player I missed could be in the 90-100 region but I made sure to include all relevant players. Luka is 101st by the way, unfortunately missed it by 1 spot, Tatum is 109th tied with Carmelo. They obviously will climb quickly however.

Accounting for 50s, 60s etc. with retroactive accolades

Since this is a formula that is to be as objective as possible with the inputs, or for the data to be statistically significant, it follows that the data-set should not have blanks. Accolades should be retroactively given where possible. Bill Russell would have 6 FMVP (I think '64 would have gone to Sam Jones) and 9 DPOYs, so he deserves those awards just as much as a modern player in the perspective of making a more accurate model. Some accolades were filled in or approximated and generally works well, but I see this as a main thing to improve in the future for more accurate retroactive awards. MVP goes back to 1957 so only a few players needed attention here. All-stars go back to 1951 so these are fairly easy to account for Mikan (+2) and Schayes (+1). All-NBA goes all the way back (only used 1st and 2nd teams, ignored 3rd teams since they only go back to '89). DPOY and FMVP are fairly easy as seen from the links above and some additional research. All-defense goes back to '69 and the remaining selections to fill in were estimations from a lot of accounts about these players and some film study, but definitely an estimate. Win-share data exists for every season. Last one is VORP which goes back to '74. This is the biggest or toughest approximation next to All-Defense but there is a correlation with PER that I took and used for the players based on the PER vs VORP curve of more modern players that were similar to their position and style, but these are also an estimation.

Methodology:

The formula is normalizing and summing together each of these 11 attributes/categories with different weights: Career 1st place MVP vote share, DPOYs, rings, FMVPs, 3 best VORP seasons sum, playoff Win-shares, 3 best WS/48 season sum, career win-shares, All-NBA 1st teams and 2nd teams/2, All-Defense 1st teams and 2nd teams/2, and all-star selections.

All that is left in the formula is 3 compensation factors that apply for some players that is all explained in the next section. Each of the above columns or categories have their own weight that I adjusted using least squares to get the rankings to follow as close as possible to some fair rankings (Ben Taylor's Thinking Basketball, The Athletic, RealGM Top 100). For example greatness is commonly more offense focused and MVPs also count defense to some extent, so to give the same weight for a DPOY as an MVP would be silly and unfounded. So the MVP category has a much higher weight than DPOY. Win-shares has some bonus weight as well to capture longevity. All-defense counts for half as much as All-NBA, etc. Again this can always be changed for the future but I like the results from this initial model.

Final formula:
Spoiler:
Image


I expect questions regarding the MVP so I go into more detail for this one:

I use 1st place voting MVP share as this is the only way to look at MVP results across any year or decade without bias. MVP vote-share is not accurate because the amount of "share" changes between years, and it still wouldn't be accurate if you normalized it because some years only included 1st place MVP votes or dont have 5 votes etc. Example: Archibald had 0.9% of the MVP votes in 1980 (only 1st place votes were counted this season) so his award share would be 0.9%. Whereas Lebron had 0.8% of 1st place votes in 2008 similar to Archibald, yet his MVP award share was 13.4% because voters voted for 2-5th place as well. So using MVP share and comparing these two seasons for MVP results would not make sense, but you can compare 1st place votes without issue or bias. The only other way to do it while using statistically significant data would be to only look at the winners of the MVPs, but that offers much less granularity.

Compensations:

Pre-80s era compensation: I used a curve for where a player's average peak resides. If the peak was 1982 or later, then 0% (no adjustment). If in 1975, you have a total -4% curve. 1965 is -13%, and 1955 is -40%. I can show the raw data before all compensations but without this for example, Mikan would be in the top 5 or 6 players all time, Bill would be #2, Pettit top 20, Schayes top 30, etc. For a more specific example, Pettit's average peak is around 1960, which corresponds to a -25% curve.

ABA compensation: Having a large stint in the ABA (just Artis, Dr J, and Rick Barry being the most relevant ones) means a lot of accolades/stats get boosted as the competition wasn't as heavy, and the player-base was simply split. The rankings would be too high for these players if left untouched. Artis gets -20%, Dr J and Barry get -5% for this compensation based on portion of their primes/accolades being in ABA. Separately, I also slightly adjust MVPs during ABA years to account for the player base being split. Getting 3% of the ABA MVP votes in '76 like James Silas shouldn't be worth the same weight as someone getting 3% the next year in a combined league in '77 like Julius Erving got for example.

Height compensation: Controversial at first glance, but found that nearly all guards were underrated by the model. Aside from Harden, GP, and AI almost every other <6'6" player in the entire 80 player list was being underrated without it. It is also interesting that the Hall of Fame probability calculator from BBR has a compensation for this. I expect it comes down to win-shares underrepresenting guards, as well as small players not being able to dominate in the league as easily as bigs, and them often missing out on defensive accolades.

In my model 6'5" players get +2%, 6'4" get +4%... and 6'0" get +12%. Players that were too low (or still are for some): Dame, Arizin, Ray, Frazier, Baylor, Zeke, Kidd, Nash, Wade, Stockton, Oscar, West, Steph.

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 12:13 am
by OhayoKD
Smoothbutta wrote:.

First, appreciate the effort here. That said, I think there are some spots you might consider tweaks for. Looking through your process you seem to be wanting to approximate some combination of resume, in-era impact, and era-strength.

Let's start with the first
Bill Russell would have 6 FMVP (I think '64 would have gone to Sam Jones) and 9 DPOYs, so he deserves those awards just as much as a modern player in the perspective of making a more accurate model.

Even assuming that Sam Jones would have been finals MVP 64(am curious what the logic for that is), that would still leave 3 fmvps unaccounted for. I would also say Russell is also an example of why looking at direct results leads to a less biased approach then all-in-ones(all of which are working with incomplete data pre-75):
[img]
Read on Twitter
/photo/1[/img]
https://forums.realgm.com/boards/viewtopic.php?t=2353834

Being statistically the most valuable player ever and the best winner ever, I'd say the gist of this sort of list should probably end up with him #1(at least before the era-penalty).

You also seem to be making positional adjustments on what I'd consider shaky grounds:
Height compensation: Controversial at first glance, but found that nearly all guards were underrated by the model. Aside from Harden, GP, and AI almost every other <6'6" player in the entire 80 player list was being underrated without it. It is also interesting that the Hall of Fame probability calculator from BBR has a compensation for this. I expect it comes down to win-shares underrepresenting guards, as well as small players not being able to dominate in the league as easily as bigs, and them often missing out on defensive accolades.


It's actually the opposite. Despite being far and away the worst defenders, guards
-> have a much easier time racking up all-defensive selections because they only have to compete with each other
-> won 7 of the first 8 DPOYs because nba media did not understand the value of paint-protection, the drawbacks with steal-chasing, or that steal-differentials have minimal correlation with team defensive quality

if anything you should probably be penalizing guards as their accolades massively overshoot their actual impact.

Lastly...
Pre-80s era compensation: I used a curve for where a player's average peak resides. If the peak was 1982 or later, then 0% (no adjustment). If in 1975, you have a total -4% curve. 1965 is -13%, and 1955 is -40%. I can show the raw data before all compensations but without this for example, Mikan would be in the top 5 or 6 players all time, Bill would be #2, Pettit top 20, Schayes top 30, etc. For a more specific example, Pettit's average peak is around 1960, which corresponds to a -25% curve.

Uh...why does the adjusment stop after 82? If you are going to penalize 70's and 60's and 50's players for playing in a weaker league, there's no reason not to also being penalizing 80's and 90's and 2000's and 2010's players the same way:
Image
Even in a 6-year span, foreign talent nearly doubled. If you are applying era-penalties, then you should do so for all the eras.
 

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 12:57 am
by Smoothbutta
Thanks OhayoKD

Regarding FMVPs,

47: Joe Fulks
48: Connie Simmons
49, 50: Mikan
51: Arnie Risen
52, 53, 54: Mikan
55: Schayes
56: Arizin
57: Heinsohn
58: Pettit
59: Heinsohn (debatable between also Bill/Cousy)
60-63; Bill
64: Sam Jones (debatable with Bill)
65, 66: Bill
67: Wilt
68: Havlicek

We could discuss those debatable ones for sure, and I do plan to spend more time to evaluate and research the reasoning and finalize selections for retroactive accolades. But there is definitely an argument for Sam in '64 as Bill had a down year compared to other Finals, voter fatigue with that combination could be real, and Sam had a very efficient offensive series with 10 more ppg)

And yes Bill is #2 before the era compensation, mostly because he didn't have as much MVP voting or all-NBA 1st team selections as MJ/LBJ, but it's very arguable he would/is #1 all-time without era compensation, that would just be a different methodology than what this model has right now.

You're both right regarding guards but I also disagree in part, but my overall point is that the model strays further away from typical rankings if you don't boost the small players a bit. It's fair if in your opinion it shouldn't be there and like I said it is controversial, but it also seems necessary based on the stated goal/intro of this model.

Regarding era, the fact is the average skill level was climbing extremely fast between the 50s and 80s, but since the 80s although it has increased it is small in comparison and nobody is taking away points from Bird or MJ for playing in their era. So while I agree continuing the compensation to boost the 00s slightly and 10s and 20s even more can be justified for another model or thought experiment and may be interesting, it doesn't align with the intention of this model.

My goal is to fine tune the model with feedback including regarding retroactive awards, and then it can also be used for similar thought experiments like what if X guy got another FMVP or played for two more seasons etc. or what would it look like without compensations if people are curious, or if you bonus compensate the recent decades how would it look, etc. I find this all super interesting.

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 1:58 am
by Colbinii
Smoothbutta wrote:Thanks OhayoKD

Regarding FMVPs,

47: Joe Fulks
48: Connie Simmons
49, 50: Mikan
51: Arnie Risen
52, 53, 54: Mikan
55: Schayes
56: Arizin
57: Heinsohn
58: Pettit
59: Heinsohn (debatable between also Bill/Cousy)
60-63; Bill
64: Sam Jones (debatable with Bill)
65, 66: Bill
67: Wilt
68: Havlicek

We could discuss those debatable ones for sure, and I do plan to spend more time to evaluate and research the reasoning and finalize selections for retroactive accolades. But there is definitely an argument for Sam in '64 as Bill had a down year compared to other Finals, voter fatigue with that combination could be real, and Sam had a very efficient offensive series with 10 more ppg)

And yes Bill is #2 before the era compensation, mostly because he didn't have as much MVP voting or all-NBA 1st team selections as MJ/LBJ, but it's very arguable he would/is #1 all-time without era compensation, that would just be a different methodology than what this model has right now.

You're both right regarding guards but I also disagree in part, but my overall point is that the model strays further away from typical rankings if you don't boost the small players a bit. It's fair if in your opinion it shouldn't be there and like I said it is controversial, but it also seems necessary based on the stated goal/intro of this model.

Regarding era, the fact is the average skill level was climbing extremely fast between the 50s and 80s, but since the 80s although it has increased it is small in comparison and nobody is taking away points from Bird or MJ for playing in their era. So while I agree continuing the compensation to boost the 00s slightly and 10s and 20s even more can be justified for another model or thought experiment and may be interesting, it doesn't align with the intention of this model.


This is an interesting view. Few things to note:

The most prevelant skill growth in NBA History is arguably the off-the-dribble 3P shots, which has completely morphed defenses.

Guards literally are unplayable in today's league if they don't have some degree of off-the-dribble game, whether it's keeping a live dribble like T.J. McConnel or incredibly difficult pull-ups like Donovan Mitchell.

What exactly is the major growth from the 50s to 80s? How does it compare from the 80s to Now?

I feel like the pre-merger game was more similar to the game all the way up to "disallowing" hand checking, and then since the early 2000s the game has exponentially evolved culminating around 2022.

I don't know how you can look at say, 2003, and say today's game is more similar to it than 1975.

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 3:21 am
by penbeast0
Muggsy gets around 25-30% height bonus? Is it enough to pass Doncic?

Also, you might use size bonus that includes weight rather than a pure height bonus so the shorter but massive guys like Wes Unseld, Charles Barkely, etc. don't get too great an advantage and the rail thin tall guys like Manute Bol aren't at too great a disadvantage.

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 3:54 am
by f4p
very cool. i had a formula going for the top 30 of the Top 100 project that got about R^2 = 0.7 iirc but it was not nearly as extensive. just box score, longevity and championships (FMVP didn't improve it at all but mostly because FMVP ~= Championships for a lot of the top 30). this is much better and goes for all of the Top 100, which is what i had wanted to see, just to see how far some players deviated from the "accepted" criteria of evaluating all of the top 100. nash was by far (by faaarrr) the biggest deviation in my list but the MVP's seem to save him here.

for anyone complaining, keep in mind this is a formula to best match other lists (if i'm reading it right). the weightings are not the OP's opinion about basketball, but essentially a fit to everyone else's opinion about basketball. if people vote shorter players higher than expected by all other factors, then perhaps voters just have a small player bias. or a post-1982, pre-2003 bias that doesn't see league strength as having changed, no matter how much people think it has changed over that time.

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 4:27 am
by Smoothbutta
f4p wrote:very cool. i had a formula going for the top 30 of the Top 100 project that got about R^2 = 0.7 iirc but it was not nearly as extensive. just box score, longevity and championships (FMVP didn't improve it at all but mostly because FMVP ~= Championships for a lot of the top 30). this is much better and goes for all of the Top 100, which is what i had wanted to see, just to see how far some players deviated from the "accepted" criteria of evaluating all of the top 100. nash was by far (by faaarrr) the biggest deviation in my list but the MVP's seem to save him here.

for anyone complaining, keep in mind this is a formula to best match other lists (if i'm reading it right). the weightings are not the OP's opinion about basketball, but essentially a fit to everyone else's opinion about basketball. if people vote shorter players higher than expected by all other factors, then perhaps voters just have a small player bias. or a post-1982, pre-2003 bias that doesn't see league strength as having changed, no matter how much people think it has changed over that time.

Correct and appreciate it! Another way to explain it for people is imagine a 3-D graph with just MVP, All-NBA, and Win-shares for example. Every player lies somewhere in that 3-D space with those three axes. Then take the RealGM top 100, thinkingbasketball's Top 40, etc. and draw a best-fit using those axes to make a formula for generally how consensus rankings view the importance of those three categories. It doesn't follow a single player exactly just a best-fit. THEN plug in the actual player's into that formula to see where the model says the players would end up. Now imagine that with 11 categories instead of 3, in addition to three compensations. As I state in my post, it is fascinating in a way to see who gets underrated or overrated based on it as it can tell you well maybe Ewing is a bit further from top25 or top30 than we think even if you think the model underrates him. Or maybe reject that notion but someone else being higher than normal may give an indication that someone deserves a higher ranking perhaps but some legacy factor weights them down.


And yea it was a very interesting journey, in early revisions to see Harden like #20 and Nash really low, etc. Oscar was also closer to #30 that was my biggest battle. Lots of tweaks and using a formula to gauge if I am deviating as a whole further or closer to the reputable consensus lists.

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 2:53 pm
by Colbinii
f4p wrote: if people vote shorter players higher than expected by all other factors, then perhaps voters just have a small player bias. or a post-1982, pre-2003 bias that doesn't see league strength as having changed, no matter how much people think it has changed over that time.


Are you saying people don't perceive the pre-2003 to post-2003 league strength difference?

Or are you saying that the league strength has been linear since 1982?

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 3:23 pm
by Smoothbutta
Colbinii wrote:Are you saying people don't perceive the pre-2003 to post-2003 league strength difference?

Or are you saying that the league strength has been linear since 1982?

I think everyone understands the league has continually gotten better, but we are saying for player rankings people don't typically take away from 80s-2010s players, whether you agree with it or not

Colbinii wrote:I don't know how you can look at say, 2003, and say today's game is more similar to it than 1975.

I never said anything one way or the other with regards to how the game is played which is a whole nother discussion by the way, just saying in terms of player skill today's players are much more similar in skill level and athleticism to the 90s than the 90s are to the 60s for example

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 3:24 pm
by Colbinii
I see. I thought the Model was a new look at the Top 100, not a regurgitation of the populous Top 100.

Re: Model Quantifying Top 100

Posted: Wed Jul 10, 2024 3:27 pm
by Smoothbutta
Colbinii wrote:I see. I thought the Model was a new look at the Top 100, not a regurgitation of the populous Top 100.

I would read the following in case the Intro of my post didn't explain it well
Smoothbutta wrote:Another way to explain it for people is imagine a 3-D graph with just MVP, All-NBA, and Win-shares for example. Every player lies somewhere in that 3-D space with those three axes. Then take the RealGM top 100, thinkingbasketball's Top 40, etc. and draw a best-fit using those axes to make a formula for generally how consensus rankings view the importance of those three categories. It doesn't follow a single player exactly just a best-fit. THEN plug in the actual player's into that formula to see where the model says the players would end up. Now imagine that with 11 categories instead of 3, in addition to three compensations. As I state in my post, it is fascinating in a way to see who gets underrated or overrated based on it as it can tell you well maybe Ewing is a bit further from top25 or top30 than we think even if you think the model underrates him. Or maybe reject that notion but someone else being higher than normal may give an indication that someone deserves a higher ranking perhaps but some legacy factor weights them down