Statistics Backed by Regression Studies

Moderator: Doctor MJ

jvuc
Senior
Posts: 660
And1: 108
Joined: Jul 12, 2013

Statistics Backed by Regression Studies 

Post#1 » by jvuc » Thu Jul 25, 2013 12:11 pm

Which stats that are backed by regression studies to show a correlation to winning or points produced etc? That is, there are plenty of stats that measure something but without a regression, we don't know if it a relevant or important factor contributing to a team winning, or to a player's offense and defensive contribution.
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Statistics Backed by Regression Studies 

Post#2 » by Chicago76 » Fri Jul 26, 2013 7:18 am

Just about any form of statistical plus minus uses regression. Individual SPM numbers, weighted by minutes or plays (depending upon which is used to derive the plus/minus estimate), will correlate to Ortg, Drtg, and therefore margin and Pythagorean wins fairly well. Ridge-regressed plus minus is another form of regression commonly used. Wages of wins uses regression. That's about all I'll say on that one because there's no need to derail the thread with WoW side comments.

Non-regressed stats would include PER, individual Drtg, Ortg and wins shares derived from those, etc.
mysticbb
Banned User
Posts: 8,205
And1: 713
Joined: May 28, 2007
Contact:
   

Re: Statistics Backed by Regression Studies 

Post#3 » by mysticbb » Fri Jul 26, 2013 7:40 am

Chicago76 wrote:Non-regressed stats would include PER, individual Drtg, Ortg and wins shares derived from those, etc.


But you can use regression to show that they are correlating with winning; which is kind of the question, right? And yes, tests show that all of those are correlating with winning, just to a different degree.

What the op is asking is about the validity and reliability of those different metrics. But we can also use retrodiction for that, not just regression.
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Statistics Backed by Regression Studies 

Post#4 » by Chicago76 » Fri Jul 26, 2013 8:36 am

mysticbb wrote:
Chicago76 wrote:Non-regressed stats would include PER, individual Drtg, Ortg and wins shares derived from those, etc.


But you can use regression to show that they are correlating with winning; which is kind of the question, right? And yes, tests show that all of those are correlating with winning, just to a different degree.

What the op is asking is about the validity and reliability of those different metrics. But we can also use retrodiction for that, not just regression.


Oops, I missed the "backed by" part. The other thing to kind of throw in to the mix for the OP is that when you're looking for support for which stats correlate to winning, you need to be careful not to assume the relative importance of a stat is similar on a player and team level.

Re: correlation, I don't have my #s handy anymore (my machine is in a box waiting to be moved to our new place), so I don't have those numbers, or any retrodictive indicators handy.

Mystic: when you do your retrodictive analysis, do you apply any sort of aging curve for players and do you take into consideration SoS (both on the expected pythag win and actual pythag win side)? I'm not sure it makes a huge difference, but with certain teams (those getting long in the tooth or playing in a year were conferences weren't particularly balanced), I could see how it could have a significant impact.
mysticbb
Banned User
Posts: 8,205
And1: 713
Joined: May 28, 2007
Contact:
   

Re: Statistics Backed by Regression Studies 

Post#5 » by mysticbb » Fri Jul 26, 2013 9:20 am

Chicago76 wrote:Re: correlation, I don't have my #s handy anymore (my machine is in a box waiting to be moved to our new place), so I don't have those numbers, or any retrodictive indicators handy.


Two days ago I tried to find an old thread from the APBR board, in which someone posted the correlation coefficients to winning for various metrics like PER or NBA Eff, but I couldn't find it. I know that PER had something in the 0.8x range, and NBA Eff was at 0.6x to 0.7x range. So, but afterall, the correlation to winning is there.

Chicago76 wrote:Mystic: when you do your retrodictive analysis, do you apply any sort of aging curve for players and do you take into consideration SoS (both on the expected pythag win and actual pythag win side)? I'm not sure it makes a huge difference, but with certain teams (those getting long in the tooth or playing in a year were conferences weren't particularly balanced), I could see how it could have a significant impact.


When I tested my metric and the merged RAPM+SPM metric, I didn't use any kind of aging curve. I expect that such a curve can help improve the predictive power, and J.E. latest work on that is greatly appreciated. I looked into creating development curves, with age, draft pick number, height as well as the rookie value in my SPM as variables. But so far I haven't applied that in a test.

I tested how well my metric can predict MOV and SRS, and it is indeed better to predict SRS. Well, my metric is actually using SOS to adjust the player values, which might explain that. I never tested the metric without the SOS adjustment, in order to confirm that the metric in itself would predict SRS better than SOS; meaning, the player value is dependent on SOS. Other tests with players GameScore values in games against above and below average suggest, that players clearly show better GameScore values in games against worse teams than against good teams. Well, that's something which many people would actually expect, I guess.

In regard to win%, I tested the real win%, the expected pythagorean win% with and without SOS adjustment. The highest correlation with an R²=0.9964 from 1978 to 2012 is against pyth win% with SOS, without SOS adjustment it is 0.9926, for win% it is 0.9469. So, the correlation coefficient is not much effected by SOS adjustment. But when I check the scoring margin on retrodiction (only SPM, not the merged rating, which I have only done from 2000 to 2013), I get an standard error (RMSE) for MOV of 2.61 (average error of 2.08) , SRS of 2.4 (average error of 1.93). As a comparison here: http://sportskeptic.wordpress.com/2012/ ... the-goods/
My previous SPM version (the numbers still posted on my blog) had 2.21 vs. MOV and 2.07 vs. SRS in such a test as average error. A set of random values for each team between -7.5 and +7.5 (normal distribution, average 0) returned 5.2 as average error, the previous year scoring margin had an average error for that sample of 3.05, RMSE of 3.84, but also 3.0 as average error for SRS, 3.78 RMSE. That indicates, that even the unadjusted MOV of the previous season is a better indicator for SRS than for MOV.

The caveeat here is obviously that for rookies the values for the respective season tested was used.

Return to Statistical Analysis