Pretty much as soon as Kobe announced his retirement Kevin Pelton (being Kevin Pelton...) wrote this crap and I finally got annoyed enough by the surface level garbage “statistical analysis” being used as ammunition to defame a top 10 GOAT that I decided to give the data one true pass over and see just how few of the Anti-Kobe narratives were true. It turns out that of all the crap that has been slinged over the years his regular season defense seems to be the only thing that ever merited criticism and even that amounts to a lot less than people realize, but more on that later.
If the data actually ever indicated that, then fine, but it definitely does not; it smiles on Bryant in comparison to pretty much everyone but “data era” 2 guys: Lebron & KG; the former everybody accepts as being a half-tier removed from Mamba and the other seems to always rank 5 spots higher or lower than mamba when the truth is, as always, pretty much in the middle. I don’t know if its simply because KG was a RAPM giant (at least, at the level people understand it, but more on that later) and Kobe was merely excellent that this all got started but any honest examination of this stuff sees Kobe every bit as impressive as Duncan, which is more than we can say for the Bird’s & Magic’s and especially the Oscar’s & West’s yet it seems only Bryant gets shafted.
After I dug around a bit it became obvious Pelton’s “case” right in line with him as a writer; surface-level borderline trolling: His argument was along the lines of “becauz RPM, Win Shares, and Weighted Box-Score-based Expected Championships Metrics similar to VORP” Kobe isn't as good for guys who we have 0 reliable data on - pretty much on par with the quality of the logic that was used to drop Kobe in the top 100 project. However after I starting collecting the amount of data necessary I realized that the amount of time it would take to really take him apart wasn't worth it and would probably just get shoved aside in some ESPN mailbox rather result in a retraction (or better yet, can the fool). Then Kobe mic-dropped 60 and I've been spending a spare hour here and there putting this together. Before starting here were his the main “arguments”
Win shares, found at Basketball Reference, are our most complete historic NBA metric. They give us a way to compare players across NBA eras. Although full box-score stats did not become available until 1977-78, when the league started tracking player turnovers, Basketball Reference estimates turnovers and other stats that were not recorded at the time (including steals and blocks before 1973-74) to come up with approximations for player value throughout league history. Bryant is currently 15th among NBA players in career win shares (172.5), with an outside chance of surpassing Reggie Miller (174.4) by the end of the season, if he improves his level of play. Because he has started so poorly, Bryant has actually lost 0.6 win shares from his career total so far this season.
As Miller's high ranking suggests, the problem with using win shares as a historic measuring stick is they tend to reward longevity over quality of play. To better reflect the impact players had on their teams, I've developed a model that relates their win shares each season to a typical team's chances of winning a championship. This model shows value is exponential rather than linear. For instance, a season with 15 win shares (such as Bryant's 2005-06 campaign) is nearly three times as valuable as one with 10 win shares (such as his 2010-11).A preliminary version of this model shows Bryant 20th all time in expected championships added (ECA), just behind Larry Bird and ahead of the late Moses Malone.
Why Kobe doesn't rate as well by advanced stats
Bryant comes out slightly worse by this method because of his lack of truly elite statistical seasons. Bryant's best season in terms of win shares, 2005-06, ranks 102nd in NBA history behind, for example, Stephen Curry's 2014-15 campaign. That's fairly consistent with what other advanced metrics indicate. Bryant's 2005-06 performance did rank 56th all-time in PER, but his best season by my wins above replacement player statistic (2002-03, with 20.4 WAR) ranks 72nd, dating back to 1977-78.
ESPN's real plus-minus (RPM) is even harsher. Because of the need for detailed play-by-play data, RPM is available only since 2000-01, but in that span, Bryant's best rating (plus-6.3 points per 100 possessions in 2007-08) ranks 80th in that span.These all-in-one metrics are universally picking up that by the standards of all-time great scorers, Bryant was relatively inefficient. Bryant's best season in terms of true shooting percentage (.580 in 2006-07) would rank seventh in Michael Jordan's career, seventh in LeBron James' and behind five of Kevin Durant's seven full seasons.
I will demonstrate that:
1) Kobe had singularly elite statistical seasons when you actually look pure metrics and not arbitrarily weighted garbage
2) Kobe has inarguably top 10 level value in terms of “added championships”
3) Kobe’s prime impact at least on par with Duncan/Wade and clearly outstrips Paul/Dirk/T-Mac/Durant
4) Kobe’s impact is far more empirically provable than Bird/Magic (but I still have his prime strength as below theirs) and faaaarrr more empirically provable (and translatable) than Oscar’s/West’s
5) Any fair analysis of statistical impact has Kobe on par with Lebron outside of ‘13/’14 strictly as an offensive player
6) Any fair analysis of team lift/team impact has Kobe looking as good or better than many players consistently ranked above him: Hakeem, Magic, Bird, and Shaq
through a series of 5 Impact Studies. Each study utilizes statistically juxtaposes combinations of top 15 ATGs as well as elite contemporaries who might not (yet) make that cut in order to both ground the case as well as make it intelligible within already understood frameworks. Ultimately discussions often go awry because of a lack of consistency (in application/assessment), weak controls, and/or weak understanding (on the data side or on the hoops side). All of those have (to the best of my abilities) been avoided here.
DATA PREAMBLE
Statistics is supposed to be used to build a robust set of tools but, unfortunately, Hollinger is essentially the guy associated with the “advanced stats” movement and its probably why I’ve had great pushback against it; the guy is a narrative-driven hack: his championing achievement of PER was crafted to make the new ESPN poster boy look better than the then pariah who was unanimously held as the games best player. He's a pure fantasy guy with no actual basketball acumen who substituted his fantasy-informed way of thinking about the game as understanding. He basically reverse engineered a “Shawn Marion/KG weighting system” (those were the fantasy guys favorite players ever) and tweaked it a bit to give Lebron an edge over all-arounders but not lose out to superior volume scorers. And it worked. It took Lebron YEARS to catch/pass Kobe (happened in the same year...2009) yet you had people consistently putting 06-08 Lebron ahead of Bryant solely on the basis of PER. And as things moved on the the moniker “advanced” become synonymous with “better” and , well, here we are: an era where “truth” is a function of arbitrary weights chosen by those of little understanding (and...honestly...average at best intelligence).
In discourse people seem to be allowed to make assumptions about the mathematical weights to ascribe in these all in one metrics but no assumptions may be made as the importance or contribution to “goodness” that the more esoteric stuff (skillset team lift/performance, portability, etc) has. That's a shame. In all fields Analytics is supposed to be used to support quality decision-making…that’s it. As soon as you start missing the point, supplanting your assumptions as facts, you start subtracting analytic value rather than adding it – when you’re dealing with people/methods/metrics like that it really is better to just stick to the tape and the box score. If you want to be 100% matter of fact in your stances you better be doing it from a place of accuracy and intelligence and most often the time with the haters is never the case. If you really want to make the approach data centric the only valid courses of analysis are per-possession efficiency (with secondary care to volume) and per-possession impact (which still does require some context). Anything along the lines of win shares and PER are only good for large, sweeping categorizations or filters ~ I want no part of that.
Theoretical Sidebar:
THE GOOD: RAPM | Raw +/- | TS% | TO% | ORTG | Synergy
Metric #1: RAPM
After doing my digging I concluded that RAPM (regularized adjusted plus minus) represents the most powerful and accurate impact metric out there when it is properly understood. The induction algorithms used to compute RAPM values rely on highly delicate/sensitive mathematical methods whose impact is largely overlooked when applied to smaller (single-year) datasets, particularly the earliest versions. This represents HUGE folly.
RAPM represents approximations of “true” on/off data where the “true” impact represents the +/- of Player X with a theoretically infinite number of games/lineups to reference. I believe people get confused about RAPM’s veracity because it is performing an unsupervised task (+/- identification); make no mistake the methodologies used in RAPMs induction algorithms are supervised in nature and as such are prone to all of the shortcomings (sources of error) that standard predictive models are.
RAPM issue #1: Lasso Training
In predictive problems there are two distinct sources of modeling error: Bias and Variance. “Error due to Variance” pertains to the differences between model output values in the sample used and the “true” values of the target parameter while “Error due to Bias” pertains to the in-sample error produced by the procedures of interest (in this case a Lasso-centric Regularization approach). In more simple terms “Error due to Variance” can be conceptualized as sampling bias and “Error due to Bias” can be conceptualized as methodogical bias.
In RAPM target parameters (+/- values) are derived using a mathematical technique known as Regularization. Regularization is the practice of utilizing regression-based fitness functions to learn/induce models from data. Regularization procedures are essentially utilizing the sampling practices used to avoid overfitting in classification problems and applying them to OLS computations (for the uninitiated Fitness Functions are objective functions used to determine how parameters will be selected and weighted based on the available inputs).
The Statistician who devised RAPM elected to go with LASSO for his fitness function, which tends to be a very good fitness function for parsing highly collinear inputs (which is what all +/- data is) when there is sufficient data but it can be very shaky on smaller amounts of data. LASSO methods are known to achieve high degrees of predictive stability because they prioritize predictive consistency. They do so by applying loss/penalty functions that “regress down” the both the value and contributory weights of parameters they see as out of whack – this is a huge deal and introduces a large amount of Methodogical Bias in RAPMs computations; the more outlier-laden or higher-variance a particular subject (player) the less accurate the model will be.
This is a paradigm that is only exacerbated on smaller datasets because the Fitness Function has far less data to properly “assess” volatility as it relates to the accuracy of predicting a player’s impact; over time LASSO algorithms can learn to mitigate their penalties but only when they have enough data to do so…this is really the crux of the whole methodological bias thing; LASSO penalties regress down outliers because they are scared that these outliers are less reflective of true impact but in reality we know that this is not the case…it’s a mathematical assumption that has no traction in real life basketball…something that the algorithm actually can “learn” and correct over time (with cross-validation) but only with adequately sized training and test sets. I believe there has been an attempt at using gradient boosting to offset the sample issue with single-year (PI) studies (in order to supply CV folds with an adequate number of observations) but have yet to see the results or any confirmation; and even then re-sampling procedures are exponentially better when you can start with 10 real years of actual data as opposed to 1.
LASSO models are either trained 1 of 2 ways: Ridge Regression or Elastic Net Regression. Both training methods share the problems discussed above but the former has more potential in the hands of someone with relevant domain knowledge (as long as they don’t go overboard with their assumptions). Ridge Regression Algorithms penalize coefficients furthest from an apriori target (in this case the RAPM parameters from the prior season) while Elastic Net Algorithms penalize coefficients furthest from some predetermined measure of central tendency (typically mu or 0.00). RAPM utilizes the former (RR) in its computations.
I personally would never apply LASSO methods to single-year NBA lineup data without putting heavy work into a proprietary fitness function I was confident could handle the volatility issues. That aside I do agree with the decision to employ Ridge Regression over Elastic Net Regression. RAPM, however, goes beyond simply using a LASSO model with the previous season’s RAPM as the prior; the designer included 3 spurious parameters in his induction algorithm: an aging curve, an RTM parameter, and an injury parameter and I only fully agree with 1/3. I wouldn’t have used such a tight RTM (he used 0.15, I’d go no tighter than 0.25) and age has nothing to do with actual, realized impact, its just an example of an assumption taking the place of the empirical; there is no material/quantitative reason to believe age has anything to do with impact…its true that veteran players who understand the game better tend to make less mistakes but that is a phenomena that should be left for the data to “deduce” on its own. I could go into more detail in regards to nuts of the spurious parameters but ultimately they don’t totally throw things off and I feel like I’ve gotten bogged down in math enough as it is. At some point the guy who did the 10-year study should have at least run the same Regression starting from 2011 working backwards to 2002 to tell us what kind of impact the age curve had on the initial run but...fooey, again, that's probably getting too mathy for here.
But I would like to illustrate why streak shooters are the most prone to Lasso-Based Errors: If said player is currently “slumping” and a minor lineup change is made, that is an entirely different 5-man unit; if the player then goes 6-9 and rattles of 17 points in the next 5 minutes RAPM will interpret that sequence as an outlier and not only will it tend to “regress down” the amount of +/- contributed to a split it will tend to “award” an inordinate amount of the “lift/credit” to the player that was substituted. For these types of players/situations larger-than-otherwise amounts of data is needed to accurately measure/assess these situations; an amount of data simply not present in 1-year splits. I can think of no other player in the data era more susceptible to this phenomena than McGrady & Bryant. For now, I'll leave it there.
Metric #2: Raw +/-
I'm going to keep the rest of these short and sweet. Raw +/- is pure, so really, its technically the most accurate measure of what actually happened. It is HIGHLY noisy and doesn't pay any care to the quality of a player's surroundings so it is of limited use unless two things occur: a large enough dataset is compiled to offset noise (generally at least 4-5 years worth of relatively uninterrupted data) and it is conflated and scaled with a metric that does account for team caliber (I like SRS). If those two criteria are met it can be very useful as a validation metric as well as a check on the more sensitive single year RAPM while allowing us to look at impact with more granularity than what is provided by 10-year RAPM. Nobody should really ever use any SPM or APM...you either go with the best math (RAPM) or you go with the purest form (Raw +/- & SRS) – anything else is missing the point.
Prime Efficiency Metrics: TS% | TO% | ORTG
These are the only counting metrics that are of large value in player to player comparisons. DRTG and the (lol) net-rating (ORTG - DRTG) are absurd/missing the point because one reflects the per-100 scoring of the PLAYER-used possessions while the other reflects the per-100 points allowed by the TEAM while the player is on the floor. Unless you are comparing DRTG of teammates it isn't a 1 to 1 comparison...and if you ARE trying to suss out defensive “responsibility” for varying players on a team On/Off data is the far better way to go.
Synergy Data
Some people call this “Play Type” data, some call it “Vantage Data,” ultimately it is simply situational PPP (points per possession) analysis. This is my favorite stuff, it really allows you to check hypothesis, investigate player archetypes, glean understanding into meta trends and suss out small details/outliers that make great players great. Unfortunately with the rise of the data it was initially removed from the internet and made “Pay for” only – it has returned for free now but you can only use the tools provided for current-season data. Its a shame, because while I have a lot of stuff on Kobe & Lebron in my notes...they're really the only ones and even then sometimes I didn't update data at seasons end because I just always assumed it would be there. Its unfortunate but because of that I can't/won't employ much Synergy data here...I'll reference some stuff I do have but only as tangential supports to things.
THE BAD: X-RAPM (and its Derivatives) | BPM
I tend to see people using these metrics as following into 1 of three categories:
- Group A: People who are being intellectually dishonest
- Group B: People who do not understand what “Box-Score Informed” means
- Group C: People who are completely missing the point
Group A knows the entire point of +/- data is to measure realized impact but will not let that get in the way of whatever narrative they are trying to spin. Some painfully obvious names come to mind but you can typically spot these people as the ones who will use RAPM, NPI RAPM, X-RAPM, BPM, etc, in rotating form, whichever one paints the picture they'd like painted.
For Group B: What is going on is that the designers of the algorithms just code parameters based on their fantasy-basketball inspired assumptions for what the highest impact players are. Everything from efficiency to height to “all-aroundedness” are up for the game and depending on whether it is X-RAPM or RPM different types of box scores “push” impact to certain levels. I'm guessing just from the X-RAPM stuff I've seen scoring efficiency by position, height by position, and double/double players (or 20/6/6 players) get huge boosts. RPM, on the other hand, the results are just so perfect that its obvious they are biasing the parameters HEAVILY toward “superstar” productivity...I'm guessing its a simple PER or W/S based cutoff that “pulls” those guys toward +8 and no matter how much pure impact you have unless you are hitting certain PER/ORTG/etc cutoffs its impossible to get much higher than +6. The point of these stats is not to conform to or confirm preconceived notions ~ unless you are a “Group A” type.
Group C: Self-explanatory. If you understand what this stuff is and aren't a troll there is literally no reason to ever favor/reference/utilize these metrics. There is no phenomena or anything it is actually telling us about basketball, nor is it reflecting something that happened that is beyond our capabilities to “track.” Any avenue of inquiry dedicated to impact should not stray from trying to isolate pure impact...it's just the wrong way to go.
BPM is even worse than X-RAPM/RPM. Its pure garbage. It literally uses some arbitrary box score function, ORTG, and DRTG to “forecast” impact of the player in question. It is insane how terrible an approach this is and yet because the “geniuses” at bballref decided to include this on their player page you have people who actually believe it is an impact stat...it's unreal. But just to see how garbage these stats actually are some numbers (I'm not even sure how these people purport to have the datasets for 90s RAPM when there isn't full play-by-play...the most I've ever seen in a 90s dataset was like 60% of the season but anyway):
91-97 MJ X-RAPM: 5.0, 4.9, 5.5, N/A, 2.1, 5.2, 4.3
91-97 Stockton X-RAPM: 4.7, 6.5, 5.5, 5.1, 5.8, 4.9, 5.1
91-97 Robinson X-RAPM: 6.4, 8.9, 7.9, 9.0, 9.0, 2.8
Now for probably the worst “Math” I've ever seen:
2006 Kobe Per 100 Possessions: 46/7/6 // 115 ORTG // 56TS% // 5.9 ORAPM // +19 Off +/- (on 8th Offense)
2006 Kobe OBPM: 7.6
2006 Lebron Per 100 Possessions: 39/9/8 // 115 ORTG // 57TS% // 3.9 ORAPM // +13 Off +/- (on 9th Offense)
2006 Lebron OBPM: 7.9
Ummm.............................................................................................
THE WHATEVER: PER, WS48, USG%, AST%
PER – Good sorting players into tiers of productivity, perhaps for some role-definition or cutoffs in a comparison. Beyond that, joke.
WS48 – See PER
USG% - Fine stat but it isn't really what people think it is and it tends to mislead or than it guides. It's used to paint guys as “ball hogs” when guys who pound the ball and monopolize time of possession suck the wind out of offenses far more than guys who play off ball but may shoot more. Again, better for role definition than anything else.
AST% - Can someone tell me what is useful about measuring this that isn't immediately gleaned from an actual box score? I've never been able to figure it out. And collating AST%/TO% is the same kind of quirky “what are people doing” things as “Net RTG.” One is calculated as a function of usage. The other is a function of a team stat. AST% is more linked to rebounding rate than it is to TO%.
That's pretty much all I have to say about “advanced” stats. There's a lot more but I got really tired of talking about that stuff about ¼ the way through the section and ultimately the point is to set the groundwork for the impact/efficiency stats I will use while making those who aren't yet aware of the shortcomings and idiosyncrasies of key metrics. There are some more issues with RAPM (even extended RAPM) readings that I haven't gone into yet but will address as they become relevant to the discourse. Below is a link to the Google Doc containing the 5 Spreadsheets relevant to this post; each Sheet is titled according to the impact study it pertains to and I will clearly say when the spreadsheets are to be referenced. Do not just run through the sheets without reading the impact study writings because most won't make sense without the included preamble.
IMPACT STUDIEZ LINK
IMPACT STUDY #1: Extended +/- Analysis
Given the discussion in the introduction my goal here was to utilize the RAPM readout with the widest window possible in order to ensure any inference is based on the most valid data available. Currently there are 3 extended RAPM splits available; A 4-Year Split (2008-2011), a 6-Year Split (2006-2011), and a 10-Year Split (2002-2011). There are really only 2 potential arguments for seeing one of the shorter studies as less valid; the first would be the contention that the 10-year split will be less representative of average value due to a preponderance of “Junk” data (seasons where the player was injured or pre/post prime) and second would be that the data was somehow biased against players who didn’t play all 10 years. Both are easily debunked.
The “Junk Data” Argument
Kobe had his fair share of injuries/non-prime seasons included in his split. If the argument is simply that this window paints him in a better light than a closer examination would otherwise it is incorrect. Looking strictly at their 3 worst RAPM showings Kobe’s peak/prime stretch is going to be held back more by his lower-level seasons than everybody else in the top 15 save Nash…and even with Nash there’s a caveat. See below.
Kobe’s Bottom 3 Seasons: 55th (2004), 123rd (2005), 32nd (2011)
I can see someone arguing that by being combined with other seasons in a larger dataset the “error” in the three seasons chosen for Kobe would diminish and consequently it’s a bit cyclical to use this as evidence of lower contributions while arguing the fact that he was underrated due to variance but it actually can be both; yes, RAPM generally missed on these years, but its doubtful he ever would have had a shot at being in the top 10 outside of 2011. He’d probably still be a fringe top-20 guy in 2004 and somewhere in the 50s in 2005 (at best)…that would still put Kobe’s bottom 3 seasons every bit in line with what we’re seeing from the other guys (still trailing, actually).
Wade’s Bottom 3 Seasons: 75th (2004), 9th (2007), 31st(2008)
Wade only played 9 games injured in 2007, the rest of the time he was hurt he didn’t play. The contribution of rookie
seasons are generally de-weighted in the multi-year training algorithms, so the only truly detrimental stretch seems to be the 20-30 games he played injured in 2008 – a far cry from Kobe’s “worse than moderate” Plantar Fascia for all of 2005.
Dirk’s Bottom 3 Seasons: 53rd (2002), 6th (2004), 18th (2010)
This is Dirk’s 100% Optimal Window. If anyone gets a comparative window boost its him. All of his prime. 0 Pre-Prime or Post-Prime seasons. Akin to using a 2001-2010 window for Kobe, a 99-08 Window for Duncan 05-14 for Lebron, etc.
Nash’s Bottom 3 Seasons: 116th (2002), 76th (2003), 150th (2004)
Nash may be the only one who has 3 more “damning” seasons in terms of the raw ranking but these three seasons came in immediate succession and all at the beginning of his run; as such their contribution isn’t 1 to 1 with someone with the same rankings but in 3-4 year gaps (not even close actually, I’ll discuss this later). If Nash was held back by his 3x low end inclusions (average rank of 114th in successive years all at the beginning) as Kobe (average rank of 73rd with a large gap separating 2 and the connected two falling right in the middle). Basically, when the Algorithm compresses the data “seasons” no long matter but collections of vectors (data that are “near” each other in the instance space) tend to be treated/weighed together when the procedures relevant to RAPM derivation are applied – all of Nash’s low points coming in succession in the initial training phase is going to weigh down his split much less than the raw rankings would suggest. Not a huge deal but important…he isn’t by any means “slighted” here except that there’s probably some “pinning” in his split like there is with Dirk’s (algorithm doesn’t correctly parse their offensive impact form their defensive impact because of both lineup-related stuff and math-related stuff).
Duncan’s Bottom 3 Seasons: 15th (2009), 11th (2010), 13th (2011)
A cursory analysis suggests shifting Duncan’s window forward wouldn’t do him any favors; he was so intelligently deployed post prime he was able to keep up a per-possession effort level and efficacy range identical or close enough to his prime for it to far outstrip the lower-tier seasons of Kobe/Pierce/Wade etc. He was a much better player from 98-00 but in terms of RAPM he finished 23rd, 10th, and 13th those seasons. If anything, 02-11 looks like about as good a window as Tim could hope for (only NPI RAPM exists for 2001).
Pierce’s Bottom 3 Seasons: 60th (2003), 69th (2004), 57th (2011)
Kidd’s Bottom 3 Seasons:: 21st (2010), 25th (2008), 57th (2011)
The “Extra Data” Argument
There really isn’t any reason to think that having 1 or 2 (or even 4) less years of data takes away from the split computation; worst case scenario the lower number of possessions mean that our confidence level in the RAPM outputs are incrementally (like, a couple %) lower for guys like Paul or Dwight who have 3-4 less seasons of data. Paul actually makes for a great test case; he ranks higher in the 10 year study (6th) than in the 6-year study (8th) and in the 4-year study (which eliminates the two seasons potentially “detrimental” to his extended split; 2006 & 2007) he again peaks at 6th. His RAPM is identical in both the 10-year and 6-year studies, so the thought that Lebron or Wade are slighted by an extra year or two is bunk.
Still, to facilitate the cleanest comparisons possible I kept this impact study to only the top 15 guys who missed no more than 2 years (eliminating Paul, Howard, and Aldridge) as well as anyone whose Injury history/decline made it impossible to evaluate evenly using the intended frameworks (eliminating Baron). This left me with Lebron, KG, Wade, Kobe, Manu, Duncan, Nash, Dirk, Pierce, Kidd, and Artest.
Spreadsheet #1: LASSO
This spreadsheet contains the original and my “error-adjusted” RAPM for the 11 players mentioned above. The methodology used to perform the error adjustments is below.
I utilized the 02-11 RAPM splits as a base from which I adjusted the RAPM figures for the contributing seasons by normalizing the relationship between their season-to-season average RAPM and their 10-year RAPM.
There are definitely some imperfections here; the 2002/2003 splits are compressed by comparison due to some of the statistical training choices so LBJ/Manu/Wade may slightly skew the adjustment multiplier. There were guys that missed 20-25 games in a year or two (Wade, KG) but for the sake of simplicity no re-weighting was done ~ this wouldn't have changed much but it makes a couple small differences. The biggest and most obvious imperfections are the 2 mathematical assumptions used in this mode of analysis: that the LASSO-related “error” is evenly distributed between seasons and that eRAPM represents a truly valid target parameter. In realty the errors due to LASSO penalties likely exhibit an extremely chaotic distribution and even when using 10 years of data these methods produce approximations at best; parsing the collinear inputs in NBA lineup data is an absurdly, absurdly difficult thing to do – people need to recalibrate their expectations of what is possible and their interpretations of what the data actually means.
The sheet is set up as follows: There are 2 clearly labeled datasets; the top is the recorded (regular season prior informed) RAPM for each year between 2002-2011 and the bottom dataset is the adjusted RAPM. Each column either contains the year of interest or 1 of 5 labels (each of which are defined below):
PI-AVG: Sum of all RAPM splits divided by total number of seasons played (within the 02-11 window)
eRAPM: 10-Year RAPM. Moving forward I will only refer to the 10-year split as eRAPM.
% of eRAPM: PI-AVG/eRAPM for the player of interest
SYE Multipler: This figure is computed in 2 phases: 1st the % of eRAPM for all players is summed and then divided by 11. 2nd SYE for each individual player is then computed by dividing their % of eRAPM by the figure derived in step 1 (in this case 75% or 0.75). Each Original RAPM cell associated with the target player is then multiplied by the SYE and the result is their Lasso-Adjusted RAPM (referred to as aRAPM: moving forward)
Norm PI-AVG: Sum of all aRAPM splits divided by total number of seasons played (within the 02-11 window)
VIEW "LASSO" SHEET HERE
KOBE VS DUNCAN
I have no allusions about arguing Kobe over Duncan all time. I've got Tim 3rd on my list thanks to the insane longevity he demonstrated between 2011-2015. But looking at the data his prime and Kobe's prime seem nearly indistinguishable in th e RS/PS. I'll get into playoffs later but the data has Kobe having about 6% more per-possession impact than Duncan (6.1 vs. 5.8) and he did this playing about 12% more court time (38.7mpg vs 34.7mpg). As discussed earlier sliding Tim's window wouldn't really do much for his split but his MPG would rise.
Tim's Next 4 Seasons: [98] [99] [00] [01]
Kobe's Next 4 Seasons: [12] [13] [00] [01]
looking at their top 4 seasons outside of this window Kobe, at worst, would probably fall back in line with Tim: 2001 is essentially a wash, and while the separation of 98-00 Duncan over 00/12/13 Kobe is strong, it's the difference between a top 5-6 player and a top 10-12 player. Its up to each person how they'd weigh that but from a data perspective 98-11 Duncan is a virtual wash with 00-13 Kobe, and if you chop Duncan's career off at 2011 he's still indisputably top 10.
KOBE VS DIRK
Dirk's case over Kobe is virtually nonexistant. If anyone could ever climb on the basis of portability its Dirk (and he certainly has on my list, I've got him 12th) but there's a limit to how much raw impact that can make up...particularly since Kobe doesn't lack for portability himself. He's not Reggie/Durant/Allen/Dirk but he ain't Wade or AI either. Kobe's prime stretch has him as having 9-10% more per-possession impact than Nowitski (6.1 vs. 5.6) while playing an additional 1.3mpg. Unlike Duncan his next 4 seasons are clearly less impressive than Kobe's, with 2001 Kobe being the only “MVP-Level” campaign amongst them.
Dirk's Next 4 Seasons: [01] [12] [14] [15]
Kobe's Next 4 Seasons: [01] [12] [00] [13]
Kobe elevates his overall level of play in the PS (more on this later) while Dirk is the same guy as he was in the RS, and he had a very particular (and exploitable) flaw that could (and did) inhibit his ability to lead a championship-level run until 2008. I don't weight that all that much but everything does matter.
KOBE VS OSCAR
Kobe 6.6 | Lebron 6.6 | Wade 6.2 | Nash 5.5 | Paul 5.2
These are the top 5 offensive splits per eRAPM. It dodges Lebron's absurd Offensive 13/14 stretch but what it tells us is that outside of 09/10 Lebron there was nobody having as much impact on the offensive end as Bryant during the body of his prime (and even 09/10 Lebron really gets all of his separation from 06-09 Kobe defensively...the offensive impact is razor thin and Kobe's skillset is far more preferable). Shaq may have had as much or more peak impact as Kobe in 02/03 and there is absolutely a case for 05-10 Nash slightly outstripping Mamba as an offensive player as well but if so the differences are minimal.
On the other hand with Oscar we have nothing nearly this empirical, he gets the benefit of all of these positive assumptions and misconceptions about his stats and his game...its almost always about narrative with him (which again, in comparison to Kobe, is comical). I mean Oscar's bad teams were (much) worse than Kobe's bad teams and they had BETTER talent relative to era. The only great team he ever played for deployed him in a much smaller/lower-fidelity role than Kobe's great teams and Oscar's one great team wasn't markedly superior to the 2000, 2001, or 2009 Lakers.
I tend to see his game as split evenly down the middle between CP3 and Pierce; his body manipulation tactics are extremely similar to PP and as a P&R handler his balance is Paul-esque but he also lacks many of their important features. He's great but not GOAT like Paul at limiting turnovers. His release point on his faceup jumpers low and slow which makes it much tougher for him to get off shots without putting a guy on his hip. Considering his BEST attribute is P&R navigation this is a big issue. His FTR would come down without a doubt (bringing down his efficiency) and he doesn't compensate with added range. Because of the mechanics of his jumper there's no late-second pull up from 20+ feet like with CP3 and PP so in the waning seconds of the shot clock he doesn't stand up as an emergency outlet. His mid-range accuracy isn't on Paul's level...he seems to basically be a wash with Pierce between 15-20 but if he can get inside of 13 feet he's clearly better. Considering all of this I don't know how anybody can find him to be a superior player to even CP3, much less Bryant
All of this is to say he's a great great player with a top 15 career and borderline if not sure top 20 peak. But his case over Bryant was never more than narrative and was always conveniently made by those with an agenda...I don't know anyone who compared these guys evenly who ever had Oscar as superior. I've got him 2nd amongst all-time Pgs but that's a longevity thing...I have Prime Nash clearly ahead of him, Paul slightly ahead of him, and peak Penny as a virtual wash. Most people have Wade's peak clearly ahead of Oscar's and (as we'll see) Kobe's peak is essentially inseparable from D-Wade's on an impact scale and has the benefit of being attached to a player who can shoot the ball from outside of 16 feet.
INFERENCE
Kobe was the best offensive player of his era; Lebron and Nash MIGHT have peaked a tad higher but Lebron's monopolizing style doesn't generalize as well as Kobe's on-ball/off-ball balance (much more on this later) with much better shooting. With Nash he's only able to achieve impact in line with (or slightly above) Kobe when he's used in lineups that can't function defensively...with more atypical plodding or non-shooting centers he's a slight step behind them...make of that whatever you wish.
As a two-way player Kobe looks like the 3rd most impressive guy in terms of prime. He's essentially locked in a 3-way tie with Wade and Manu but played far more minutes and proved far more durable than either guy...multiple seasons in Wade's prime on a contending team would have been complete wastes (2007/2008) to contenders and Ginobili's deployment skews things; even taking his impact at face value though the raw minutes differential is gigantic. Paul and Duncan are the closest but there's separation there, with Dirk/Nash behind Paul/Duncan by the same amount those 2 trail Bean. He's got the (along with Lebron) strongest unipolar split of 6.6. Dirk is a top 15 player ever. Tim/LBJ are top 5 ATGs? With that in mind Kobe seems to have had a top 10-level prime. But lets keep digging.
IMPACT STUDY #2: Granular +/- Analysis
One of the biggest reasons Kobe gets shafted in “comparisons” to Oscar (and others) is that he argued as “never being the best player in the league” because one of a random rotating group of “peers” were always considered his equal or better (amongst smalls) without any of these guys (except for Lebron) actually separating themselves from Kobe. As far as competition goes, Prime CP3, Nash, T-Mac, Wade, and Durant smash West, Baylor, and...Hondo? As a form of analysis-by-proxy this impact study was designed to highlight that, unlike with Oscar, we have a significant amount of empirical data that suggests Kobe was effectively the best offensive player of his era in a way that is only assumed about Oscar. The guys who make it closest (Lebron/Nash) are either far less portable (LBJ) or require “no-hands” rules and offensive biased skill-sets around them to squeeze out marginally more value (Nash). The windows chosen correspond to the period over which these guys were most often used to denigrate Bryant (Nash's window being the exception – I never really heard anything after 08 but Steve sustained that level of play through 2010 and more data always trumps less). The only non-contemporaneous comparison is Kobe/KD.
Spreadsheet #2: ON/OFF
This spreadsheet is split up into a 3-Attribute comparison, with each column representing an Attribute, The three Attributes are RAPM, Boost, and Team O. The Methodology is provided below – in extensive impact analysis I tend to weigh aRAPM (or RAPM when not available) 70%, Boost 30%, and use Team O as a tiebreaker.
The RAPM Attribute for the Kobe/T-Mac and Kobe/Paul windows is standard RAPM ~ it would not have been fair to use Kobe's adjusted data here without a valid SYE for Paul or McGrady. I had to use NPI Data for Kobe/Tracy's 2001 RAPM cell. In the Kobe/Wade and Kobe/Nash window aRAPM is used for the RAPM Attribute cells. In seasons used in the Kobe/KD window were not concurrent and as such their in-season RAPM ranking (in terms of top listed players) was used.
The Boost Attribute combines pure raw +/- with Team SRS to balance team strength with the players on/off imprint. Its up to the individual how much he or she wants to weigh this but when you're talking about 4+ seasons of data it's a lot to ignore, particularly when it is the only truly pure data we have. The formula for Boost is simply: [(Raw +/-) + (1.5*SRS)]/2.
The Team O Attribute doesn't represent any distilled metric, I simply provided the Raw Offensive +/- and Seasonal Team Offensive Ranking for the player/team of interest.
VIEW ON|OFF SHEET HERE
KOBE VS T-MAC
The first “he's better than Kobe” guy who actually got some real traction and posed a real “threat” was T-Mac. 2000/2001 Vince was a Monster but not quite as special as McGrady at his peak, and Tracy was at least able to put together a respectable and uninterrupted 5-year stretch so I'll start with him.
The data is clearly telling us that while he was generally comparable by his peak box scores (lower shooting efficiency but balanced out by extremely low turnover rates) Tracy never really managed to get close to Kobe outside of 2003 (and even then its a fairly clear choice for Kobe). Kobe's seasonal RAPM smokes Tracy...he gets 50% separation. McGrady is absolutely the type of guy whose streaky shooting and janky lineups over this stretch can hold him back in short-term windows but when juxtaposed to a Kobe or a Pierce any difference is going to be minimal...not coming close to making up the 50% differential. Kobe has an essentially identical advantage in raw impact so here the validation mechanism isn't doing Mac any favors. In a strict peak for peak comparison of 2003 Kobe doubles Mac up in RAPM and is still slightly ahead in raw Boost...there isn't a single data-based Mac argument.
The portability argument also completely fell on its head. People used to argue that T-Mac would have 3-peated because even if he doesn't have Kobe's per-possession defensive impact or insane situational PS defense value he's a better fit next to a great big because he doesn't need to dribble around as much, takes better care of the ball, and is a better shooter. As we found out...none of this really ended up mattering. When Mac joined the Rockets in 2005 and was asked to play second fiddle to Yao for essentially the first 30 games and the results were disastrous...The Rockets were a sub-.500 team and Tracy was basically a volume scorer on bare;y above 50TS% and respectable-but-not-elite playmaking. JVG then just decided that Yao was more capable of scaling down his game and could be almost as much of a threat in high P&R as he was on the block and the Rockets offense took off with T-Macs box score/efficiency stats following suit → Houston won a lot of games and ended up a 50-win team. So yes, Tracy is a trancendent player, but the idea he'd be superior in a more “placating” role is nonsense...Yao is MUCH easier for a Kobe/Wade/T-Mac/Lebron to slide next to than O'Neal when you're talking about 1 guy inhibiting the peak-level play of another. And even with Yao as a #2 Tracy's results are incomparable to Kobe when paired with a similar level offensive big (Pau).
As far as the impact separation goes, I think it boils down to the fact that outside of Lebron the level of raw impact guys who are simply unstoppable and persistent at getting to the basket can have on team offenses. The data is showing us that even before he was an efficient player Westbrook was having substantial offensive impact (which matches the real-time analysis). Baron Davis's split really says it all. Wade was a guy who made his living off of this and only this skill. When the efficiency is comparable...unless the guy cannot sustain his efficiency or completely kills offensive flow the guy creating more doubles, pulling defenders further away from their man (this aspect is HUGE), and creating high-fidelity offensive rebounding opportunities is always going to have more per-possession impact than his raw efficiency stats will dictate. This is why impact data is important. When applied with some IQ the Young Bryant/Prime Wade style of play is simply more conducive to raising the level of a TEAM'S offense than McGrady. This isn't to say Mac didn't attack the basket a lot...its just that Kobe did it a lot more. To me, its the same phenomena that clearly separates Magic/Nash from Paul as offensive players.
All in all when you look at the fact the 2001-2003 Magic finished 14th, 7th, 10th with some of the worst offensive support in NBA history, and balance the fact that McGrady's skill-set is highly inciting (his height makes him terrifying in the P&R as a passer), I don't mind painting him as a close-to-Kobe guy strictly on the offensive end when you talk about short peak stretches...at the very least he's better than Wade and just as good as Kobe at propping up absolute garbage offenses...he just doesn’t offer any of their defensive perks (remember this is predominantly 2001-2004 Kobe we're talking about here). Also, contrary to popular theory, doesn't actually pair better alongside an elite big. But there is absolutely (nor has their every been) any robust data to support him as BETTER than Kobe, strictly 2003 or otherwise. Not 1 iota.
KOBE VS PAUL
Generally speaking Kobe was the best player in the league from 2006-2008; he's certainly got the most impressive 3-year stretch but season for season most people affiliated with the league had him as the best player each individual season. Even those that didn't vote him for MVP in 2006/2007 generally referenced his team record and not level of play as the reason he wasn't MVP. But in 2008 he got some decent pieces, won 57 games, and took home the award. You'd think that would be the end of it but literally not since the day he was awarded an MVP trophy have his inexorable haters blandied on about how he was “ever the best player in the league because Nash & Paul were better during his peak stretch.” I'd like to summarily dismiss that nonsense. Neither player has any well-rounded impact centric argument that puts them clearly (or at all) ahead of Bean.
The per-possession data favors Kobe in pretty much every single form it comes in. 2008 is unanimously viewed as his peak season so lets dig in: Kobe crushes Paul in 2008 RAPM, he doubles him up. I understand the argument that Paul's defensive prior got caught on his 2007 split but nobody every allows for or acknowledges this in 2006 Kobe's split...still, even if you were literally to flip the sign on his split (to a +0.8 DRAPM) he doesn't even come close. In the 2009 data they are much closer (6.1 to 5.8) but Kobe wasn't even as good in the 09RS as he had been the year prior and, more importantly, there's a HUGE colliniearity error in Kobe's 2009 split; when you understand that Odom absorbed a lot of Kobe's +/- lift on offense (like Green these past 2 seasons with Curry) you see that Kobe was having outsize type of impact in 2009 and probably got at least +2 separation from Paul. Their 2010 RAPM data again has them separated by about 20 spots and it isn't as if Paul's split was affected by his injury ~ he was injured 45 games into the season and never returned – all of his on court time he was healthy. Their eRAPM has them pretty close, but again, Kobe's data includes much lower lows than CP3's 06-11 or 08-11 datasets and 2008/2009 are Kobe's RAPM PEAKS. The validation data is Paul's only recourse; the Boost separation between the two was miniscule over the 2-year stretch (9.9 to 9.6) but when you analyze team performance is reinforces the notion that the RAPM data speaks to a true gap.
SIDEBAR: Kobe vs. Paul 2008 Team Lift
What you put what Kobe did up against what CP3 did considering their respective support...I am as perplexed now looking at the data as I was in real time without it as to how even-handed people could argue Paul had a MORE impressive season.
Paul had the better average support over the 82-game schedule. These discussions can quickly devolve in to esoteric references to nebulous concepts and a bunch of bickering so I'll be short on supposition and long on data here: We have extensive data to say that Lamar Odom and Tyson Chandler have comparable impact in the roles that they played: Odom sported an (06-11) 3.2 RAPM while Chandler sported an (06-11) 3.0 RAPM. Personally I prefer Tyson as a player; Lamar can only have that type of impact as a #3, when he is asked to be a #2 he struggles to go beyond a +1 guy while Chandler can be a +3 as the 2nd, 3rd, or 4th best player on a given team. Also, Chandler's split incorperates his 2006 season, which is far, far, far away from his prime level; an (07-12) split would likely put him at least on par with Odom. But let's call these guys a wash.
The difference between Pau Gasol and David West, in their primes, is pretty small. The biggest difference is that one guy played with a player stans try to tear down while the other played with a guy stans like to prop up. Pau is a top 12-13 type of guy and West is a top 15-18 type of guy. Pau's (06-11) RAPM was a 2.0 while West's was a 1.6. So yeah, so far a wash for their 3rd guy and a moderate edge for the #2 guys (the data isn't saying Chandler/Odom > West/Pau but it does say that they are very special players who can have impact that goes far beyond what their numbers would indicate in the right settings...Chandler is a DPOY who is a huge outlier as a catch+finish big and Odom is Draymond lite without the 3-ball).
Now for some context – the Lakers essentially had 3 different teams that year. For a team who had that kind of roster turnover to make it to the finals is essentially unheard of. There were the Kobe/Drew Lakers, the Kobe Lakers, and the Kobe/Pau Lakers. In the 35 games Kobe played with Bynum (whose impact was somewhere in between West and Pau's when healthy, I had him as my #3 in DPOY before going down and his 08-11 RAPM has him right in West/Pau's range) the Lakers had the best record in the west at 24-11 while rocking a better than +7SRS. After losing Drew the Lakers played 21 games with basically the same roster as 2007 with Fisher instead of Smush and were right in line with the 2007 Lakers at 11-10 (just above .500) – the takeaway being Kobe played in line with his absurd 07 campaign during this stretch. Then the Lakers add Gasol, and go 22-4 (69-win pace +9SRS). That absolutely obliterates what Paul was able to manage – with a completely healthy Peja/West/Chandler he led a 5.5SRS team. The average SRS of Kobe's 3 teams was 7.3.
Speaking of Peja, he is the piece that seals this comparison for Kobe. 2008 Peja as the 4th best player on a squad creates a chasm of separation between the casts that no small quibbling over Pau/Odom vs. West/Chandler would fix ~ Peja was one of the best and most underrated regular season offensive players in recent history. He's an all-time great shooter. In 2008 he shot 44% from 3pt on 7 attempts per game far before that was the fashionable thing. His eRAPM has him as a +2.8, which was good for 30th, with an elite offensive score of +3.5. His (06-11) RAPM has him as a +2.3, and that window cuts off most of his good years. Specifically in 2008 he was a +3. He posted 16+ points on 58TS% with a 5.3 TO%. Literally all of the data has him as a flat out standout level role player even after his SAC prime. When you compare him to the rest of the Lakers cast...its a joke. The 2008 Lakers cannot boast 1 clear net-neutral player after Kobe/Pau/Lamar...there is Radmanovic, Walton, Fisher, and Vujacic. Ariza was solid but only played 24 games, was a low-minute player, and then became useless after his injury. Our 4th best player was probably Ronny Turiaf.
Recapping...there was no misappropriation of the 2008 MVP. Kobe absolutely trounces CP3 in every shred of impact data and, with a statistically inferior (per large amounts of RAPM data) supporting cast Bryant's team ended up with the #1 seed and an SRS 32% Higher.
But really we can set all of the data aside and just remember what happened. Paul was given EVERY opportunity to steal the conversation from Kobe and simply failed. These teams met on April 11th with 2 games left in the season and NOH had the chance to all but lock up the #1 seed as they were ½ game ahead of us in the standings. Everybody knew the stakes for that game...These guys were playing for the #1 and since the MVP conversation had come down to those two (KG, at least in the media's eyes, took himself out of it when Boston went 9-2 without him). This game was going to obviously factor into the general thinking...and Kobe summarily DOMINATED and carried the Lakers to a late-game win: 29/10/8 with 1 turnover on 53% shooting...his ORTG was a friggin 148. Paul played alright. Then the playoffs happened and Paul lost in round 2 to an inferior team.
So why am seeing realgm polls in 2016 where Paul gets 50% more votes than Kobe for Top Player in 2008? Is it just dat Winshares and dat BPM? Because I at least remember the average hate arguments rising to slightly beyond Kevin Pelton level but the more threads I see, the more I think perhaps not.
Finally, to just 100% debunk the notion that Nash/Dirk “deserved” their MVPs as much or more than Bean see below; I've put together a quick distribution of Kobe's 10x core contenders for the 2006-2008 MVP stretch: the first number indicates where said player placed in the actual MVP voting and the second figure indicates where their seasonal (unadjusted) RAPM rates them.
2006 Nash: #1/#10 // 2006 Dirk: #3/#5 // 2006 Wade: #6/#1 // 2006 LBJ: #2/#7
2007 Nash: #2/#5 // 2007 Duncan: #4/#1 // 2007 Dirk: #1/#6
2008 KG: #3/#1 // 2008 CP3: #2/#26 // 2008 LBJ: #4/#5
Kobe Avg RAPM Finish: 7th (6.7)// Kobe AVG MVP Finish: 3rd (2.7)// Kobe MVP|RAPM Ratio: .403
Opposition Avg RAPM Finish: 7th (6.7)// Opposition AVG MVP Finish: 3rd (2.8) // Opp MVP|RAPM Ratio: .417
So yeah, if RAPM is the criteria Kobe was right in line with the average placement on the RAPM/MVP paradigm (tracking slightly under actually) and Kobe had the same exact RAPM ranking as the previous two winners (6th). There is no respectable & consistent criteria that has him losing out as MVP/POY all three seasons. None.
KOBE VS NASH
So with Steve I don't want to get overly verbose as I hit on most of the PG/MVP stuff in the CP3 discussion and will be talking about Steve more later. Essentially, when Nash is firing on all cylinders I believe he has slightly more per-possession offensive impact, which is incredibly impressive, but he's the biggest net-negative defensive player of any MVP type guy. A quick glance at the aRAPM and Team O Attributes basically says:
Takeaway #1: Nash has the highest raw on/off ceiling if you can put together a skewed lineup (that is, one that prioritizes guard skills at the 4 and 5 positions) out there but when you do this you are essentially surrendering all chances at an elite team defense unless you have Draymond Green or Kevin Garnett as your center
Takeaway #2:On a traditional teams (with a more lumbering/post oriented Center and “only” one jumpshooting big, or even just a Center that can't shoot) Nash still has outsize offensive impact but it falls about 0.5 points behind Kobe/LBJ. However on these teams Nash shows that he really isn't a gigantic negative on defense himself, he's a very small negative (by virtue of being a PG who competes more than anything else...it's really hard for a guard to be THAT detrimental to a team's overall defense).