"Best" stat to rank individual defense?

Post #21 » by **Doctor MJ** » Sat May 26, 2012 8:55 pm

mysticbb wrote:
Doctor MJ wrote:Hmm I basically take it as a given that in anything technical here you're at least as knowledgeable as me, so I'm not sure why I'm talking past you here. Perhaps I'm just not using the standard vocabulary?

Oh, I fully understand what you want to say, the issue is that it seems as if you don't really understand the technical aspect here. There is a mathematical theorem which states: There is always a lambda for which the MSE of the ridge regression is smaller than the MSE of the OLS.
That is PROVEN! There is no discussion about it.

If you understand what I'm saying then I don't understand why you bring up MSE. I'm talking about the certainty that any particular RAPM result for a player is better than the corresponding APM result. Taking a mean isn't going to give you particulars.

mysticbb wrote:We have a ill-posed problem and thus it is a given that ridge regression will produce a better result. There is NOTHING in the math which would indicate that in "some" cases the OLS would produce better results.

Is there anything in the math that proves your assumed alternative? Seems to me that when it comes to general superiority vs total dominance, the existence of any error at all just leaves us uncertain when it comes to proofs, which would seem to make it reasonable to use the less bold interpretation.

mysticbb wrote:AND we can ONLY take the WHOLE set of coefficients and NOT pick randomly some. It makes ZERO sense to select a few values from APM over the RAPM values only because you FEEL they are more accurate.

If you're talking about literally selecting one of the two values to input in some other formula I agree with you. However what I'm primarily talking about is my level of confidence with the stats in question. I'm cautious when using outliers that have don't have perfect precision. This is not me refusing to use them, just being cautious with how much I let that sway my opinion.

mysticbb wrote: In fact if we calculate the "standard error" for RAPM (which makes not much sense due to the introduced bias, but whatever) via bootstrap, we will see a value of in average 2.5 after half a season of data, a value of about 2 after 2/3 of a season and a value of about 1.5 after a full season played. Check out the average SE for 2yr APM and you will realise how much better ridge regression really is. After TWO full seasons played the SE for the players with the lowest SE is at about 2.3. In average we are ending up with about 3 as the SE for a dataset consisting of two full seasons. That is something we can see for RAPM after 25 to 30 games played. What else do you want to have?

I'll just say this is a good response. I need to think more on it.

mysticbb wrote:I can imagine that using a APM value for a player instead of a RAPM value can even improve the prediction, but in average it will get worse. So, if you make an analysis in which you replace the RAPM value with the APM for a certrain player, I would be surprised, if you find many players for which the result of an out-of-sample test would be better.

Just pausing here because I'm a little taken aback that after all this exchange you said that. I feel like we could have skipped a lot of the dialogue if you'd just said this from the beginning. Clearly you see this as something that's not very important or useful. :lol:

"Well yes, I guess technically dropping a giant anvil on a sniper would kill him but..."

mysticbb wrote:For sure sample size is an issue, just not in the way it is an issue with APM. APM needs a big sample in order to not come up with insane results due to overfitting, that issue is eliminated with RAPM. Essential: While APM is trying to seperate player performances by all means, RAPM just says that if there is not enough data to seperate them, they might as well be equal in terms of value.
What we have to overcome within the sample is the normal variance of the player performances, and in average we see enough full cycles for each player within 25 games or so. A lot of players have a big game-to-game variance, some play 3 good games and 1 bad or whatever, but we hardly see players having 25 good games in a row while then 25 bad games in a row.

Hmm. Okay, so I do realize that I don't have as firm a grasp on the math here as you do. I put time enough in that I had thought I had a pretty decent grasp on how it worked, but you putting it in these terms isn't making sense to me. Can you give me links to what you see as the easiest way for me to grasp the math in basketball terms? I'll say up front that I would imagine that I've already read some of them, but if I did, I clearly need to re-read, and I'd rather not make the process any more cumbersome than it has to be.

mysticbb wrote:
Doctor MJ wrote:Well here's what I say: It's possible the resistance is based on wanting to believe that the weird values I see in RAPM (or APM for that matter) is sample-sized based because the alternative means that this data is farther from my perceptions of what the best players are and this discourages me.

Why do you want to use those values to justify your opinion about a player? Why aren't you using the lamppost to enlight you?

Well, I'm not justifying the existence of bias, I'm acknowledging it may exist.

If you're wanting to know how I think generally, I always use an approach that uses both my basketball common sense combined with the numbers. This leaves room for some bias, but I really have no doubt that it makes for better analysis than having total faith in either of the two sides of the coin. This is not meant as a criticism toward you btw, just saying what works for me, and what I've seen in others that I want to avoid.

EvanZ · Post #22 » by **EvanZ** » Sat May 26, 2012 9:29 pm

Search for bias vs. variance tradeoff. That in essence is what regularization is about.

mysticbb · Post #23 » by **mysticbb** » Sat May 26, 2012 11:08 pm

Doctor MJ wrote:If you understand what I'm saying then I don't understand why you bring up MSE. I'm talking about the certainty that any particular RAPM result for a player is better than the corresponding APM result. Taking a mean isn't going to give you particulars.

Well, I thought that theorem pretty much tells everything we need to know about it. I understand that you want look at the particular coefficients for the respective players. But then again, how do you want to determine which of those values is more "valid"? I don't see any reasonable ansatz which would lead to a reasonable amount of players for which the APM value would constantly lead to better predictions.

Doctor MJ wrote:Is there anything in the math that proves your assumed alternative?

Ah, I see, my wording could be interpreted as if I meant that every single coefficient generated by RAPM is per se better than the coefficient resulting from OLS. Well, I actually just meant the existing theorem I mentioned before. That would be the proof for that.

Doctor MJ wrote:If you're talking about literally selecting one of the two values to input in some other formula I agree with you. However what I'm primarily talking about is my level of confidence with the stats in question. I'm cautious when using outliers that have don't have perfect precision. This is not me refusing to use them, just being cautious with how much I let that sway my opinion.

The question I have is how do you want to determine "confidence"? When you say "confidence" I think of the confidence interval, which is defined, but you are actually talking about a "feeling", not so much about evidence. To be honest, it is hard for me to follow, because I'm under the impression that we are talking about the theory here, not so much about the interpretation. Maybe it is my fault due to a clear misunderstanding.

Doctor MJ wrote:Just pausing here because I'm a little taken aback that after all this exchange you said that. I feel like we could have skipped a lot of the dialogue if you'd just said this from the beginning. Clearly you see this as something that's not very important or useful. :lol: "Well yes, I guess technically dropping a giant anvil on a sniper would kill him but..."

Well, I realise that I didn't make it clear what my problem with your approach is. Mainly, I don't understand how you would determine which value is more trustworthy and which is more of an outlier. See, for me I'm just going by the math and it becomes really clear that RAPM is more trustworthy, not just because of the theory, but because it can actually be shown with the data we have for a couple of seasons. So, as you might understand I'm not looking at the APM values of players and say, well, it makes more sense to me than the RAPM values, thus I believe more in those specific APM values. Given the fact that each coefficient is also influenced by the coefficients of the other players, I have a hard time seeing that for a big enough amount of players the APM result would actually produce better prediction. For some players? Maybe, but not for that many. And obviously I don't see how you would be able to pick the "right" value between APM and RAPM with any kind of consistency.

Doctor MJ wrote:Hmm. Okay, so I do realize that I don't have as firm a grasp on the math here as you do. I put time enough in that I had thought I had a pretty decent grasp on how it worked, but you putting it in these terms isn't making sense to me. Can you give me links to what you see as the easiest way for me to grasp the math in basketball terms? I'll say up front that I would imagine that I've already read some of them, but if I did, I clearly need to re-read, and I'd rather not make the process any more cumbersome than it has to be.

As Evan pointed out bias-variance tradeoff would be a good starting point. Other than that I only have a few math and time series analysis scripts, which are mainly in German, thus, I doubt it is really that helpful.

To say it a bit differently (although not specifically correct): The bias is helping us to improve the signal-to-noise ratio. The variance is basically the noise here and the coefficients our measured signal. Well, for players I always imagine their performances as a part of a wave, with an amplitude presenting the max and min performance level while most games are within those max and min, like a sine curve for example. The wavelength contains all sort of games, which are just repeating itself (more or less). I once tested that idea on a couple of players and in the end I came up with an estimate of 25 to 30 games which are needed to complete one full wavelength. Obviously, that is an approximation, but at the end the conclusion is that the variance isn't a big deal after 25 to 30 games anymore, if the appropriate methods are used. Well, and ridge regression is actually such a method. Engelmann tested the dependance of the predictive power from the sample size and found that the one year data with a prior is giving the best predictions.

Doctor MJ wrote:If you're wanting to know how I think generally, I always use an approach that uses both my basketball common sense combined with the numbers.

Well, I most certainly agree that we need basketball common sense in order to interpret the numbers, without the basketball knowledge those numbers are useless (imho). What I just want to see is that the best available data is used in order to come up with consistent conclusions about players.

To go a bit away from that now, because it somewhat bothers me: I see constantly people talking about RAPM or APM, who think that "adjusted plus minus" means that we just take the raw +/- numbers and then apply some "adjustment factors" to it. I didn't notice that before that much, but it was probably always there.
It is also the case that people like to use sortable numbers to rank players, while my understanding of the method and my basketball knowledge are telling me that those are not player ranking tools. I tried to avoid that on my blog, the SPM numbers are just sorted and overall I would not declare a player with a higher SPM as per se the better player. SPM just says that the player with the higher value has a better combination of production and efficiency based on boxscore numbers. The SPM value quantifies that per 100 possessions based on the assumption that an average player has 0. For RAPM it is similar.
And last but not least, people trying to make a point that RAPM/APM can't tell how a player plays. Well, it is obviously also not designed for it, thus why do those people even bringing this up?

"Best" stat to rank individual defense?

Re: "Best" stat to rank individual defense?

Re: "Best" stat to rank individual defense?

Re: "Best" stat to rank individual defense?

Re: "Best" stat to rank individual defense?

Re: "Best" stat to rank individual defense?

Re: "Best" stat to rank individual defense?