Reservoirdawgs wrote:I've recently been re-reading the Top 20 basketball threads since the project is now over and what hits me the most is that the "Pro RAPM crowd" were making arguments for certain players backed it up using RAPM, advanced stats, box score stats, AND qualitative analysis. I may not have always agreed with some of their conclusions, but they made compelling arguments that were backed up by stats AND through descriptions of the players' game. I can't really say the same for most of the"Anti-RAPM crowd", who spent most of the time A) complaining about a stat that they clearly didn't understand simply because it didn't show their favorite player in as favorable of a light and B) putting forward the most bare-bones analysis that focused on narratives and the most basic of stats. Even when I occasionally agreed with some of their conclusions, how they got there was intellectually dishonest and fairly closed-minded. I certainly didn't learn anything from them, while I can say that the "Pro-RAPM" crowd displayed MUCH more understanding of the game and how stats are compiled and used.
I think we should have some constructive conversation on how each side approachs RAPM or any of these all-encompassing impact stats.
One thing, I really wish the pro-RAPM guys would do is not be so condescending about the stat and how "No one understands the math or understand the stat". I don't see one person on here doing any mathematical computations while doing this stat. I see lists posted, numbers posted, but no computations, as if it takes some high level of computational ability to understand the this stat.
The same way no one is here for period history lessons, english compositions lessons, no one is here to be judged on mathematical understanding. We are here to discuss basketball. Even if you are pro-RAPM, all you are doing is regurgitating what someone else has already computed, which is a list of numbers.
Now let's all be honest about RAPM, the reason for it's origin was likely because someone did not like how their favorite player was not being seen in a favorable light (probably Kevin Garnett) and the goal was to create a way to put their favorite player in a more favorable light.
What I am going to do is eliminate the condescension that is
rampant amongst the pro-RAPM clique. I would like to note if you truly understand something you can teach it to others in order for them to understand it. I learned that when I was a podium instructor for five years doing lots of powerpoint presentations; if all you are doing is repeating what is on the slide and not providing any insight on how to understand the material you may not truly understand what you are teaching.
TL;DR - don't just repeat what is on the slide, we can all read.
I have researched the stat and I have not come to any satisfying conclusions but for those who may not have read about the stat and want it broken down in layman terms I have found some pretty good links:
http://regressing.deadspin.com/just-wha ... 1560361469
What the hell is this stat?
The short answer is we have no idea what's here. The long answer, is we know exactly what a large but undefined portion of this is, since it already exists, but don't know what's new here. The sell is that Real Plus-Minus (RPM) tells you how much better a team played on offense and defense when a given player was on the floor, and how much that improvement was that individual player's doing. From the introductory post:
[The] metric isolates the unique plus-minus impact of each NBA player by adjusting for the effects of each teammate, opposing player and coach. ... The RPM model sifts through more than 230,000 possessions each NBA season to tease apart the "real" plus-minus effects attributable to each player, employing techniques similar to those used by scientific researchers when they need to model the effects of numerous variables at the same time.
One part infomercial, two parts bull; it's an easy elision to miss as you're racing past to see what the stat is for, but what it does isn't easy to glean from the post itself, or the presentation of the RPM stats on ESPN. For that, you really need to understand the stuff it's built on.
So essentially it attempts to be an all encompassing stat to say how valuable/impact/good each player is while accounting for multiple variable e.g. teammate, coaching, opposition. Conclusion: still need to see more because it does sound infomercial-ish.
Here's Wayne Winston explaining what Advanced Plus-Minus does:
It reflects the impact of each player on his team's scoring margin after controlling for the strength of every teammate and every opponent during each minute he's on the court.
Adjusted +/- ratings indicate how many additional points are contributed to a team's scoring margin by a given player in comparison to the league-average player whose adjusted +/- value is zero over the span of a typical game. It is assumpted that in a typical game a team has 100 offensive and 100 defensive possessions. For example, if a +6.5 "adjusted +/-" player is on the floor with 4 average teammates, his team will average about 6.5 points better per 100 possessions than 5 average players would.
So taking the standard per 100 possession model, and assuming you are playing with four average teammates you can determine how many points a player creates (or doesn't create) over the course of a game. So you have assign a "control element" for teammates and for the players replacement. Moving on,
The work on that front goes on, though. Here's Joe Still on RAPM:
In "Regularized Adjusted Plus-Minus" (RAPM), the goal is to provide more accurate results by employing a special technique called "ridge regression" (a.k.a. regularization). It significantly reduces standard errors in Adjusted Plus-Minus (APM).
Conventional adjusted plus-minus is shown to do a poor job of predicting the outcome of future games, particularly when fit on less than one season of data. Adding regularization greatly improves accuracy, and some player ratings change dramatically. The enhancement with the RAPM is a Bayesian technique in which the data is combined with a priori beliefs regarding reasonable ranges for the parameters in order to produce more accurate models.
Ahhhh right here, this is the phrase where the pro-RAPM squad attempts to gain the upperhand with technical bravado. Let's break down ridge regression/regularization so we can eliminate this go to argument.
Ridge regression is used when there are multiple independent discoveries of data; for example if a company launches a media campaign and they use television, radio, website all simultaneously. When they begin analyzing the data from those three sources to make any predicitons if the collinearity is moderate there may not be much "noise" but it collinearity is severe it can increase the variance in the estimates. This can cause those estimates to be sensitive to any changes in the model, thus making those estimates unstable and hard to predict.
So what collinearity may or may not be a problem, but what we do know is:
It can make choosing what to use as a predictor difficult
Causes interference in determining how exact effect of the predictor
Depending on your goals does not affect the model or produce "bad" predictions
All of this, by mission more than method, is what RPM is supposed to be doing. So what is new here? ESPN again:
RPM reflects enhancements to RAPM by Engelmann, among them the use of Bayesian priors, aging curves, score of the game and extensive out-of-sample testing to improve RPM's predictive accuracy.
Which sounds an awful lot like slapping a binding on someone else's science fair project and selling it as a textbook. RAPM is already used by hardcore NBA heads, so it's more than a little odd to see ESPN roll this out without explaining what it's doing differently. We will presumably hear a little more about what's gone into RPM at some point, maybe at next year's Sloan, maybe as the playoffs ramp up. But as NBA analysis gets more observational and, therefore, contextual, the need for this sort of reverse-engineered testing should fall away, at least a little bit. For now, though, this isn't a bad way at all to judge how important (or harmful) a player is to his team.
So there is an agreement that this is an acceptable way to judge a player's value to a team. No harm, no foul.
OK, it's the newest version of an old thing. What exactly is it telling us?
The example that ESPN used was of Taj Gibson and Jamal Crawford, two of the best bench players in the league. They are fine examples, with Gibson having much more impact defensively than Crawford does. Once you adjust for DeAndre, Barnes, and company handling the defense, Jamal doesn't look as great. RPM is no different conceptually than the other stats that try to do this; it just claims to do it better.
This, the thinking goes, should show you something like how valuable a player is to a team. So Reggie Jackson, the Thunder's backup point guard who often plays with the starters, can be contextualized as less important than them, but better or worse than others in a similar role. That relationship does, though, bump against an issue with this sort of analysis.
Just at a glance, the aforementioned player pairing problems—multicollinearity issues, to be specific—do seem to affect the ratings. Take Nick Collison as an example. He's sixth overall in RPM, which could lead you to believe that he is a secret cog in the Thunder machine. And yes, Nick Collison is great for what he is, but sliding in ahead of Steph Curry and Tim Duncan does raise a few questions. Just hazarding a guess, could Perk and his literally-worst-in-the-NBA -6.19 Offensive RPM (-3.19 overall) have anything to do with that? This type of analysis expressly tries to isolate a player from his context, but in extreme cases like this, things can get messy. One surmises that Collison often replacing the single most disastrous offensive player in the known cosmos has some kind of effect of his rating, which is the sort of thing that would show up the limitations of this sort of analysis. Then again, J Crossover is currently backing up Willie Green.
http://www.boxscoregeeks.com/articles/r ... nced-stats
This is where I am challenging the pro-RAPM contingent on how well they understand it and it's inner workings.
Fun times! Now, an important part of this process is being able to replicate it. In short, if I have a theory (guess) and I make a model (compute the consequences), then you should be able to do the same thing and verify if I'm right or not (we both compare it to reality). ESPN hopped on the RPM bandwagon quickly. Here's the original post introducing RPM to the world. And here's about as in-depth as it gets to how RPM is calculated:
RPM stats are provided by Jeremias Engelmann in consultation with Steve Ilardi. RPM is based on Engelmann's xRAPM (Regularized Adjusted Plus-Minus). Play-by-play data provided by Basketball-Reference.com.
Right after this post came out, Kevin Pelton followed up with a post – and because this is ESPN, it's hidden behind a paywall – showing the RPM All-Stars. Kostya Medvedovsky had a good question about this:
[tweet]https://twitter.com/kpelton/status/453238357951127552[/tweet]
In short, ESPN is using a model their own analysts don't understand, which is based on very complicated math by some people that have done iffy analysis before. Steve Ilardi was behind APM, which Arturo deconstructed here. And as an outsider, it's even harder to understand. On Twitter after this went up, we were asked about writing a piece on it. Ok, well how does it work? I was told it was similar to RAPM or xRAPM, but even Pelton, who works for ESPN, doesn't know!
The Calculating Wins Produced page reads like it was written by a college professor.
It doesn't have the effusive explanations of how it handles the things we know matter in basketball. It doesn't use an example of a player to prove how right it is. It does however, provide the means to redo the work. RPM does not.
For those who want the "background" on RAPM: ininitally this work was started in a paper, written by Joe Sill, that was presented at Sloan. The paper was called "Improved NBA Adjusted +/- Using Regularization and Out-of-Sample Testing", and it won the grand prize.
You may notice that the Sloan site no longer has a copy of this up. There is a site up that has RAPM data --
http://stats-for-the-nba.appspot.com -- and a site with players' cumulative xRAPM. Trying to look for a site with explicit how-to instructions was difficult, but I got some Twitter feedback. Here's a description of RAPM from someone who made a boxscore variant of it. Here's an ABPR discussion about it.
So there is a lot of confusion on the background of RAPM, which is particularly disconcerting when so much faith is put into it. I looked up the origin of box plus/minus and found some interesting information.
http://godismyjudgeok.com/DStats/aspm-and-vorp/In order to create a box-score-based player evaluation composite statistic, some basis for the weights given to each statistic must be chosen. A number of different “box-score” stats have been developed over the years; I will not go over them in this space. Some of the more intricate and well known include John Hollinger’s PER (at ESPN), Justin Kubatko’s Win Shares, and Dave Berri’s Wins Produced.
The different composite statistics use a variety of approaches, from pure empirical to pure theory and a mix of the two.
I bristle at anyone who determines the weights for each statistic chosen when developing a composite stat. This already puts bias into a stat.
I have differed from earlier public SPM work in a couple of regards:
Longer-term APM to regress onto (much less random error)
Ridge-Regressed APM (RAPM) (less random error, slightly more bias)
Advanced Box Score measures rather than simple points and FGAs (more accurate and less skewed by context)
Some nonlinear interactions modeled–only what makes sense theoretically, however (more accuracy)
Some notes about the box plus/minus would be less random error is good, I'm not sure how I feel about more bias, and less context, and more accuracy is preferred. However, since many arguments devolve into "context" or "era-bias" it makes it less empirical and more of just another tool in the tool box.
It was interesting to look at the BPM 1.1 vs RAPM that Daniel Myers developed
http://public.tableau.com/profile/dsmok ... 14YearRAPM
The top 10 for RAPM over a 14 yr period was:
Garnett
Lebron
Duncan
Paul
Nowitzki
Ginobli
Nash
Pierce
Amir Johnson
Aldridge
Top 10 for BPM was:
Lebron
Paul
Wade
Garnett
Ginobli
Duncan
Kirilenko
Mcgrady
Kidd
Bryant
Some of the bigger discrepencies between the two lists were:
Nash 7th in RAPM, 85th in BPM
Noah 137 in RAPM, 11th in BPM
Durant 121 in RAPM, 12th in BPM
K.Malone 149 in RAPM, 13th in BPM
Also, reading through many posts where these regression models are being debated there isn't even an agreement amongst the people who are "pro-RAPM". I don't think you can push something when there is not complete agreement on the model and is constantly being "updated".
I remember when PER was the holy grail of advanced stats to judge a player and now if you use that in comparison it's like Rick Carlisle looking at Rajon Rondo. Now we are going through several iterations of these as they figure out how much bias can be allowed to achieve better estimates.....
I'm so tired of the typical......