lessthanjake wrote:These are all fair points. But I don’t think any of this indicates that looking at who does better or worse than other players in BPM relative to RAPM isn’t informative, at least with regards to the specific questions of (1) whether claims people make on these forums about impact-correlated box metrics like BPM underrating certain players compared to impact are actually directionally right; and (2) whether we think certain player archetypes are underrated or overrated by BPM, such that we might look at BPM for pre-1997 players and bump up or down our mental approximation of what we think that tends to suggest about their impact. Regarding #1, this directly tests that premise, though I grant that scaling being different and BPM not being inherently multi-season makes the analysis not completely precise (certainly I agree that the specific numerical difference between a player’s RAPM and BPM isn’t actually indicating BPM is overrating or underrating impact per 100 possessions by that exact amount). I don’t see good reason why it wouldn’t be generally directionally right about what players are underrated or overrated by BPM compared to RAPM, even if we might not take the specific numerical differences in a player’s RAPM and BPM numbers as being particularly meaningful.
The way BPM was developed was Daniel Myers DSMok1 (a poster on APBRmetrics) ran a regression analysis on a 15-year dataset of RAPM, with minutes, blowout, and home court adjustments (00-01 through 14-15, I believe). Here's the thread on his blog: https://godismyjudgeok.com/DStats/box-plusminus/
There are two issues with this:
1. You are comparing two orthogonal datasets. Even if there are coefficients weighting differently, it is still a statistic based on a dataset without any new information. As such, you are not determining overachievers vs underachievers. Rather, you are evaluating box score vs non box score contributions. This is very different. If your goal is to compare box score vs non box score contributions, that's fine, but this is different.
2. From a validation perspective, you are comparing years in the training set to years outside of the training set. 00-01 through 14-15 comprises a big chunk of the samples there. It's much more useful predicting outside of that set, as one of the core tenets of model validation (accuracy vs reliability vs secondary metrics) is that any analysis performed on the training set is overfit and warped.
lessthanjake wrote:Regarding #2, I don’t think there’s really any other way to try to determine what players might be overrated or underrated by pre-1997 BPM than by trying to look at what types of players are overrated or underrated compared to RAPM in post-1997 BPM.
Playing devil's advocate, if we assume the goal is what I mentioned above at the end of my reply to point (1), there are several better ways to determine:
1. Squared2020's partial RAPM from 1985-1996
(1a. On-Off from the Pollack Guides, though these are more limited in size and as such utility)
2. WOWY or WOWYr, which we have with reliability going back to the beginning of the league (or if you care about weighting by possessions, until at least 1973-74, when we first have turnovers...though I actually have team turnovers and I believe offensive rebounds from the Pollack guides for 69-70 through 72-73 somewhere.)
lessthanjake wrote:As for comparing “pure RAPM” with “box-score informed RAPM,” that’s another interesting avenue to look at a similar thing. I’d imagine that if the RAPM was being informed by BPM then it’d end up to the benefit of a very similar set of players that seem to benefit from BPM in this analysis, though I take your point about weighted-averages and whatnot, so it’s not guaranteed to just correlate exactly in terms of who it helps the most/least.
One problem with comparing “pure RAPM” and “box-score informed RAPM” is that there’s a whole bunch of other relevant methodological decisions that go into RAPM. For instance, let’s say Player A does better in “pure RAPM” than “box-score informed RAPM” but the “pure RAPM” also has a rubberband adjustment and the “box-score informed RAPM” doesn’t. We wouldn’t really be able to tell which change is causing the difference. Ideally, the best way to test it would be to have “pure RAPM” and “box-score informed RAPM” where the only methodological difference is the existence of a box prior, but I don’t think we actually have any natural experiment like that. If we do, then I’d certainly be interested in seeing it, though!
Suppose you want pure, unadjusted RAPM. You can either:
(a) Calculate your own RAPM. It's not super difficult once you've went through the process once (there are guides on APBRmetrics). The matchupfiles are 95% of the work. I don't like the idea of gatekeeping though, so there is also option...
(b) Request datasets on the APBRmetrics board or ask a creator on Twitter. Most would be happy to oblige.
If you want to make your own adjustments for HCA, or garbage time/blowouts, or FT%/3p%, etc., then yeah, you'll need to edit the matchupfiles, which is very difficult. But is likely unnecessary for this test.
---
As an aside, there are two other things to consider:
1. You excluded Kobe, but some of the players (in particular Jordan) don't have proper 5 year samples. Comparing 2 vs 5 years is noisier.
2. Keep in mind that BPM is measuring *on-court* performance. RAPM is finding the difference between on-court and off-court (roughly speaking), and finding a good fit to the constraints.
I am a big Curry and KG fan, who likely would perform very well in a proper comparison, so I don't take issue with the concept. This just isn't measuring what it intends to measure.