WarriorGM wrote:I beg to differ. For example let us take RPM which uses something like height as a prior. I may "understand" why that might increase the accuracy of the body of predictions as a whole but I also understand it is an artificial distortion that may unjustly penalize shorter players with outlier skills and I do not understand it enough to correct for it easily. Better not to use it at all in my view. There is understanding and then there is understanding. Understanding the limits of the statistical model and what it is is very important. For example you look at both RAPM and wins as indicators and say RAPM gives you a more precise picture than looking at team wins. In my view you are already making a fundamental mistake. While you may be very correct in one sense you completely miss the point in another. Wins are not just another indicator. It is the ultimate indicator. To value RAPM over wins the way you have done is like saying a drug is successful because it lowered cholesterol levels even though it didn't reduce mortality outcomes.
Height is a prior as you said. It's influence gets lower the bigger the sample size of the data set is. In a whole season of PI RAPM the prior doesn't influence the scores heavily. At best you see slight over-/underrating which you again can account for when you see a score that strikes you as not reliable (note that height is only used for the split between O and D and not for the score on defense itself, means RAPM isn't overrating him. DRAPM may overrate him while ORAPM would underrate him to the same degree).
WarriorGM wrote:I'm unconvinced you've addressed my point. I'm unsure if it's the proper term/idea but I think you are assuming homoskedasticity in the data but what if it's heteroskedastic or it doesn't follow a bell curve probability distribution? Sample size reliability also seems largely limited to the year of measurement. Yes using multiple years gives affirmation that the general conclusion is correct (KG is a very good player) but it loses precision I therefore disagree the probability of KG being overrated by the data is close to 0.
Perhaps it would be useful to have a common set of data we can refer to so I can point out specific concerns with RAPM I have. If you are willing to look though I notice from one source that LeBron's RAPM dips in the years he is on the Heat while it rises when he is with the Cavaliers? Can you explain this? LeBron's and KG's situations also would appear similar in their early years as is their style of play. Maybe being a big fish point forward in a small pond is just naturally conducive to high RAPM numbers?
What I read out of your post prior to this and in this one is that you are concerned RAPM overrates players on bad teams. I addressed that. It could also be that RAPM overrates players on good teams because of their high point differentials. I addressed that to. RAPM adjusts for these things. What it is meant to say is roughly 'how much more than the average NBA player in that year did the player in question impact the point differential playing on a random team, against a random team' (somebody can correct me if there is missing something or misrepresented). So my answer is it may be easier to have higher raw on/off numbers on weak teams but not higher RAPM.
The Lebron thing I can't quite confirm. I always look at multiple data sets. Most of them are from JE. Looking at the scores of RS + PO data 12 and 09 look close enough to call it roughly a draw, because of the difficulties already explained about comparing between years (not being exact). One has 12 in front some others 09 slightly. Looking at how they did compared to competition both are far and away the best in RAPM in the respective years. RS only data has 12 a little behind what I don't think is all that contrarian to the eye test. One NPI set has Lebron at 6th or something, but again NPI RS only is a small sample with bigger room for error. 13 is falling a bit behind these too. 16 on the other hand again is very comparable.
So I don't think what you are saying is confirmed looking at JE's data sets.
For the probability KG is overrated. I worded it strong, but KG is great in single years (saying he was impactful in the years themselves) multiyears who are more precise because of the bigger sample but lose year by year info but confirm that he was that great. increasing the sample even more loses more detailed year by year information but confirms what we already saw in the year by year data itself. The probability that all these different data sets and sample sizes had big errors is just really really unlikely (note that RAPM has KG on Lebron level yet nobody is arguing him at Lebron level, knowing about the possibility it overrates him at least a little, but in a big way that you could say he isn't comparable to Shaq or Duncan? In my eyes nearly no way.).
WarriorGM wrote:I've stated above why I give special import to wins. Your claim that I use it more heavily than RAPM users use RAPM strikes me as false. I use a lot of supporting evidence that verifies the wins. Using wins as the starting point has that advantage though: there will be supporting evidence. Like the accidental discovery of penicillin you have a useful cure already so determining how it works is easier than first coming up with a hypothesis then trying to find a cure the way they have tried to find one for Alzheimers and gotten nowhere.
It also has an disadvantage. You dismiss great players simply for having bad luck over an extended period of time. Players that were unfortunate are thought of badly right from the start because your starting point is wins. If you concede there is the possibility that players are simply unfortunate, you should account for that and not dismiss players at your starting point (especially when every data for individual players that is made to show the impact on team wins are pointing to a player impacting it big). But I have to say I don't want to discuss the 'win approach' anymore. We discussed it enough. We have different views on sports and that is ok. Agree to disagree.
WarriorGM wrote:Raw data contains the information derivative data does; it actually contains more data. It's like mining for gold. You might come up with a lot of worthless rocks but there may be platinum and even more valuable minerals in there than the gold you initially sought to find. A moving average might at one glance provide an easier to understand picture of what is happening in a data series and you might even find great value in the slope of the moving average another derivative. But there is information in the raw values themselves that using other forms of analysis might yet unlock. In regards to how I use raw +/- specifically I think you also vastly overestimate my use of it. I've only referred to it in exceptional cases. But its transparency and ease of understanding I think make it just as if not more reliable in those particular instances than RAPM. I'm perfectly willing to refer to RAPM too.
The raw data is lineup data. KG's +/- is actually KG + teammates +/- data. So not it is not just showing you RAPM results in raw form. Your and my brain aren't able to make the adjustments RAPM makes in regard to teammates, opponents and so on. We can't look at the data and extract the impact of a single player (which should be the goal since you want to evaluate individuals and not lineups) in our head. Don't get me wrong, I use raw data too for small samples like PO to at least see trends and indicators. But when we have a big sample no human being can extract the impact of a single player out of lineup data as good as the ridge regression used for RAPM.
Some of your points are actually valid questions when you are new to RAPM, but in the end it might be better to just inform in the internet since I am not an expert and most of the discussion really is about the nature of the model itself. So it can be answered by simply looking at sources in the internet. I know what it does, I know how to use it. But to get the best information possible, you should just inform yourself at the main sources for that.