JeepCSC wrote:I think I need to go back to read up on RAPM. Things like 'making the stat' make it sound, well, created to be flawed. It has been fascinating to watch play out, but I'm on a steep learning curve and it makes it troublesome to follow along completely. But I definitely enjoy these threads. Have made the off-season much more enjoyable.
There's certainly more depth that we can get into, but just quick hits:
You can "make the stat" using nothing but NBA play-by-play data and ready-made functions in things like R and MATLAB. It's just using commonly used data analysis tools and applying them to the NBA.
What the guy with the site did that was problematic:
1. He made a version of the stat he called XRAPM which factored in box score stats (nothing wrong with that)...and he replaced his actual RAPM data with the XRAPM data without changing his labels (PROBLEM!)
2. He used his RAPM data from the '00s to develop a statistical +/- tool that estimated RAPM from the box score, and then he applied that to the '90s calling it "Fake RAPM" (nothing wrong with that)...and then he added those years to his "RAPM" table without any mention that it was fake.
It's just incredibly stupid stuff. There's nothing wrong with any of the data he has, the problem is in his refusal to label it properly on the site even though elsewhere he always calls it by its right name. He just doesn't care if people on the internet are confused.
So you see, there's really nothing about any of this that makes RAPM particularly problematic in terms of there being deviating standards. The standards don't deviate, there's just one guy knowingly mislabeling things.
Now, you might point out "Yeah but he was THE guy. There was only one guy you were using, which made you vulnerable to these issues.", and you're right. He was the only guy at the time giving us this data...because NBA teams hire these guys away and then they stop making the stat for the public. That makes it suck for the public, but it certainly doesn't make the stat look bad.
And as far as the fear of "We're just trusting these guys! They could totally make it all up!", well there's truth in that, but to what end? Even with this guy, he's not trying to mess with people, he just doesn't care about people who find his website through the internet that much. He's putting stuff out there for proof-of-concept, puts no effort into making the site pretty, doesn't have any ads, and he spends his explanations on how it works to the people he thinks are important.
And it should be noted, the people here who really use this sort of data we're aware of the issues instantly. It's not like we were using the data for a year and then found out we were confused. He explained what he was doing on the APBRmetrics forum, we immediately saw it and objected. He didn't change, and that's all there was too it.
The frustrating thing is really just that you've got people who aren't really following things that closely who somehow make their way over to his site and come back thinking an apple was an orange. And I'll admit here while I try to put blame squarely on the guy running the confused site, I do get frustrated with the RealGM posters sometimes too. Maybe it's totally unfair, but I've been giving the exact same message on these boards from the moment the confusion began around two years ago. The explanation is so simple, and I & others have been so verbal telling it everywhere we can, but somehow people still often either (1) go to that site unphased or (2) end up thinking it's all just superconfusing and you don't know who to trust. It's like trying to push against the tide when it doesn't seem like it should be that hard.