Joao Saraiva wrote:I've never studied BPM a lot I don't remember how it was calculated. So I'm interested in reading somebody who is high on that stat to see what's up with the formula that gives these results.
Okay, so, a brief rundown of the stat can be found here:
http://www.basketball-reference.com/about/bpm.htmlBasically, BPM is an attempt at regressing historical box score stats onto a large RAPM sample in order to predict RAPM.
It's a pretty good attempt at a stat, but it's integral to understand the nooks and crannies of the stat in order to understand why some players might appear higher/lower than they probably should:
The formula, as shown in the article is this:
a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)
And the values for the offensive portion are this:
Code: Select all
Coeff. Term O/D BPM Value
a Regr. MPG 0.064448
b ORB% 0.211125
c DRB% -0.107545
d STL% 0.346513
e BLK% -0.052476
f AST% -0.041787
g TO%*USG% 0.932965
h Scoring 0.687359
i AST Interaction 0.007952
j 3PAr Interaction 0.374706
k Threshold Scoring -0.181891
l sqrt(AST%*TRB%) 0.239862
What are the issues with OBPM?
First and foremost, OBPM is based on regression, and unfortunately, this means that the regression can lead to some seriously biased results. Regression is really designed to only interpolate data, rather than extrapolate, as we technically don't know the distribution of data outside of the sample range. So automatically, once people start approaching outlier status in any of the individual terms, oddball things can happen, and OBPM can be severely over/underestimated.
One must consider that extrapolation happens quite a lot - the sample size is from 2001-2014, and any data pre-2000 or post-2014 is prone to extrapolation issues.I'll talk about some variables of interest -
3PAr Interaction - this dataset was compiled for the 2001-2014 NBA landscape, where the use of the 3 point shot gained more prominence. A player who attempts a lot of 3 pointers before 3 pointers became a thing might be heavily boosted (e.g. Antoine Walker) and a player that doesn't attempt many 3s in the post 2014 era might be unfairly handicapped.
It's also worth noting that 3PAr is also partially meant to act as a proxy for spacing, so players that are excellent floor spacers without shooting many 3s are likely to be underrated. This is especially true considering that floor spacers get less offensive boards. A guy like Walker would get boosted more for "floor spacing" than a guy like say, Nowitzki or Aldridge, and any sane person will tell you that is not the case.
USG*AST and AST*REB - so, these terms are direct multiplication terms, and as a result, extreme results here can throw the regression completely out of whack. The AST*REB term is especially problematic here - I don't have much of an issue with the USG*AST term, because it's not quite as large, and there's more pure merit behind an offensive player being a significant scorer
and playmaker.
It's also worth noting that the USG% and AST% terms aren't actually wholly independent - a player taking more shots is not only going to increase his USG%, but the same amount of raw assists will also increase his AST% even if his playmaking hasn't actually improved.
The AST*REB term has a couple of problems -
The DRB term is actually negative, so players that accrue a lot of defensive rebounds are likely to get punished if they don't get a lot of assists too. On the other hand, players that
do get a lot of assists can be rewarded if they usurp a lot of their teammates rebounds (please, no arguments about particular players here - that's not the purpose of this post/thread), and players that don't rebound quite as much can be unfairly punished too (e.g. Steve Nash).
In other words, as an interaction term, sometimes accruing
more of one statistic actually leads to a
lower OBPM, even when that statistic may not directly correlate with offensive play. And when the magnitude of the stat is SO high, this can lead to some silly outliers in both directions.
This is especially prominent now, because Russ * Harden are basically historic outliers here, and this props their OBPM up quite a lot.
Also, much like 3PAr, things such as assist% and rebound% have changed league wide over time, which may slightly blur historical comparisons. I don't think assist% and rebound% era changes alter BPM that much though, just worth mentioning on a technical level.
Threshold scoring - a player who is incredible at improving his teammates efficiency improves his teammate TS%, but depending on the other variables + the player's TS% itself, this might cause silly things to happen to the regression once again. Steve Nash is also probably handicapped here, IMO.
So in a nutshell, what BPM does is regress statistics against RAPM in order to create an optimal fit, and for this fit to be most "accurate" for as many players as possible, it requires some terms that might cause odd results elsewhere, e.g. the reaction terms with defensive rebounding.
Interactions between terms that are only implicitly involved in BPM - a player who takes a lot of long 2s will almost definitely get less ORB% at the expense of spacing, but only the latter is discerned in BPM. Assists to layups/dunks cause more turnovers than assists to jump shooters, but BPM doesn't differentiate between assist types. Contested and uncontested rebounds are treated the same
(in particular, some poor defensive perimeter players might be overrated due to high rebound*assist numbers on DBPM, but that's for another day).
Perhaps some of this is better explained by taking proper examples. So I'll bring up some of the players asked about -
Reggie Miller - he was an incredibly efficient scorer (led the league in TS% a couple of times) and bombed 3s at a ridiculous rate. So he's probably a bit of a 3 point/TS% outlier and this may have inflated his value. Of course, we don't have prime RAPM for Reggie Miller, so I can't speak with utmost confidence. It's worth mentioning that Ray Allen's OBPM was 4.3 over the 14 year timeframe, and his ORAPM was 5.1, and given that he's often seen as an 90s era analogue to Ray Allen, perhaps he may have simply been underrated? Who knows.
Larry Bird - he was primarily a floor spacer, and didn't really start taking any meaningful number of 3s until 1986, so a lot of his floor spacing value probably wasn't captured. Interestingly enough, Bird always had a very high DBPM, so I wonder if some of the regression favoured stats (e.g. defensive rebounding) that likely inflated his DBPM wound up marginalising his OBPM.
Dirk - Dirk is one of the poster boys for being underrated by BPM. He spaces the floor without shooting many 3s (so his ORB% drops too), he was a fairly good defensive rebounder but didn't accrue many assists (and the assists handicap two of Dirk's interaction terms) and his gravity can only really be measured by team TS% and the team adjustment - so it's possible that Dirk's team TS% actually marginalises the impact that Dirk's personal TS% would normally have.
There weren't really any good examples in that leaderboard list, but
Stephon Marbury was an example of a guy heavily overrated by OBPM based on his assists. He almost
never passed to the interior, so his TOV% wasn't too high, but his assists (which he got heavily credit for) were nigh on worthless on a team success level.
Honestly, this could take all night to do properly, so I think this might be a good idea: In the article that I linked at the start of this post, feel free to look at the Tableau graphic and observe what players seem to be overrated/underrated to you, and try and make some judgments on what might be the case in your opinion. If you're struggling (or if you want verification on your thoughts), feel free to PM me and I'll throw my 2c in on the matter.
Hope that some of this helped!