mysticbb wrote:Doctor MJ wrote:Hmm I basically take it as a given that in anything technical here you're at least as knowledgeable as me, so I'm not sure why I'm talking past you here. Perhaps I'm just not using the standard vocabulary?
Oh, I fully understand what you want to say, the issue is that it seems as if you don't really understand the technical aspect here. There is a mathematical theorem which states: There is always a lambda for which the MSE of the ridge regression is smaller than the MSE of the OLS.
That is PROVEN! There is no discussion about it.
If you understand what I'm saying then I don't understand why you bring up MSE. I'm talking about the certainty that any particular RAPM result for a player is better than the corresponding APM result. Taking a mean isn't going to give you particulars.
mysticbb wrote:We have a ill-posed problem and thus it is a given that ridge regression will produce a better result. There is NOTHING in the math which would indicate that in "some" cases the OLS would produce better results.
Is there anything in the math that proves your assumed alternative? Seems to me that when it comes to general superiority vs total dominance, the existence of any error at all just leaves us uncertain when it comes to proofs, which would seem to make it reasonable to use the less bold interpretation.
mysticbb wrote:AND we can ONLY take the WHOLE set of coefficients and NOT pick randomly some. It makes ZERO sense to select a few values from APM over the RAPM values only because you FEEL they are more accurate.
If you're talking about literally selecting one of the two values to input in some other formula I agree with you. However what I'm primarily talking about is my level of confidence with the stats in question. I'm cautious when using outliers that have don't have perfect precision. This is not me refusing to use them, just being cautious with how much I let that sway my opinion.
mysticbb wrote: In fact if we calculate the "standard error" for RAPM (which makes not much sense due to the introduced bias, but whatever) via bootstrap, we will see a value of in average 2.5 after half a season of data, a value of about 2 after 2/3 of a season and a value of about 1.5 after a full season played. Check out the average SE for 2yr APM and you will realise how much better ridge regression really is. After TWO full seasons played the SE for the players with the lowest SE is at about 2.3. In average we are ending up with about 3 as the SE for a dataset consisting of two full seasons. That is something we can see for RAPM after 25 to 30 games played. What else do you want to have?
I'll just say this is a good response. I need to think more on it.
mysticbb wrote:I can imagine that using a APM value for a player instead of a RAPM value can even improve the prediction, but in average it will get worse. So, if you make an analysis in which you replace the RAPM value with the APM for a certrain player, I would be surprised, if you find many players for which the result of an out-of-sample test would be better.
Just pausing here because I'm a little taken aback that after all this exchange you said that. I feel like we could have skipped a lot of the dialogue if you'd just said this from the beginning. Clearly you see this as something that's not very important or useful.
mysticbb wrote:For sure sample size is an issue, just not in the way it is an issue with APM. APM needs a big sample in order to not come up with insane results due to overfitting, that issue is eliminated with RAPM. Essential: While APM is trying to seperate player performances by all means, RAPM just says that if there is not enough data to seperate them, they might as well be equal in terms of value.
What we have to overcome within the sample is the normal variance of the player performances, and in average we see enough full cycles for each player within 25 games or so. A lot of players have a big game-to-game variance, some play 3 good games and 1 bad or whatever, but we hardly see players having 25 good games in a row while then 25 bad games in a row.
Hmm. Okay, so I do realize that I don't have as firm a grasp on the math here as you do. I put time enough in that I had thought I had a pretty decent grasp on how it worked, but you putting it in these terms isn't making sense to me. Can you give me links to what you see as the easiest way for me to grasp the math in basketball terms? I'll say up front that I would imagine that I've already read some of them, but if I did, I clearly need to re-read, and I'd rather not make the process any more cumbersome than it has to be.
mysticbb wrote:Doctor MJ wrote:Well here's what I say: It's possible the resistance is based on wanting to believe that the weird values I see in RAPM (or APM for that matter) is sample-sized based because the alternative means that this data is farther from my perceptions of what the best players are and this discourages me.
Why do you want to use those values to justify your opinion about a player? Why aren't you using the lamppost to enlight you?
Well, I'm not justifying the existence of bias, I'm acknowledging it may exist.
If you're wanting to know how I think generally, I always use an approach that uses both my basketball common sense combined with the numbers. This leaves room for some bias, but I really have no doubt that it makes for better analysis than having total faith in either of the two sides of the coin. This is not meant as a criticism toward you btw, just saying what works for me, and what I've seen in others that I want to avoid.







