Why I'm not a WP fan

sp6r=underrated · Post #61 » by **sp6r=underrated** » Wed Feb 9, 2011 5:07 pm

Paydro70 wrote: The notion that 5 Tyson Chandlers (or his small-man equivalent) would win 80 games just reflects a complete lack of understanding of how basketball works.

PG: Kidd (8.4 wins)
SG: Fields (9.8 wins)
SF: Battier (5.1 wins)
PF: Humphries (9.5 wins)
C: Chandler (8.6 wins)

41.4 wins balling

mysticbb · Post #62 » by **mysticbb** » Wed Feb 9, 2011 5:07 pm

Sleepy51, even if he would account for defense better, he wouldn't improve his model enough. As I pointed out the team adjustment brings his correlation coefficient to 0.94 and without that it is 0.71. He is justifying his model with that high correlation to winning. The defensive adjustment is basically justified via a residual. Well, Berri is not directly using a residual, but overall it is just that. It is also not accurate to say he got his whole model via regression, because he only did two regressions. The first regression he did was to determine A, B and C for the following formula:

Win% = A*ORTG+B*DRTG+C

A, B and C are presented on his page as the coefficients next to offensive efficiency, defensive efficiency and constant term.

What he is now proposing are those two formulas for PA and PE. At this point he is not using a regression to determine the marginal values, he is using league average values. The differentiation gives us:

Marginal value PTS = A/PE
Marginal value PE = A*(-PTS/PE^2)
Marginal value DPTS = B/PA
Marginal value PA = B*(-DPTS/PA^2)

The values for PTS, PE, DPTS and PA are the league average values in his data set. If you use a different data set you will get slightly different values.

In the next step he is using the marginal values and is assigning them accordingly to boxscore stats which are related to PTS, PE, PA and DPTS. You can easily see that by the numbers for free throws. (1-0.47)*0.032 = 0.017 and 0.47*-0.032 = -0.015. Well, by chosing 0.47 as the factor for free throws he set the break even point for free throw shooting to 47 ft%. Nicely done. That just means a player can shoot 48 ft% on his free throws and he will help his team winning more than a guy who is shooting 49 fg% on his two point field goals. In fact the latter will hurt his team while the other will help.

At the end he is coming up with the conclusion that a player has to shoot over 50% from the field on his two point shots to help his team winning games. That is above the league average.

Where does the problem comes from? The formula for possession employed works for the overall team, but not for the individual player. Adding up FGA+0.47*FTA+TO-ORB will not give us a good estimation of the possessions a player used. To understand that: If we applying this formula to a player like Love, we are getting right now a 152.8 ORtg for him. He scored 1111 points with 777 FGA, 351 FTA, 120 TO and 247 ORB. According to Oliver he has 123 ORtg. Now, if we look at a guy like Dirk Nowitzki who is scoring at a higher rate and with a higher efficiency while turning the ball over less, we get 114.4 ORtg, while according to Oliver he has 118. Or my new favorite upcoming franchise player Kris Humphries has 141.7 ORtg, if we apply Berri's formula for offensive efficiency.

As you can see the problem is not only defense, the problem is that the formulas Berri is using can't be applied on players. They will give you absurd numbers for guys who are getting offensive rebounds. Well, if you add those numbers up for each player on each team, it will for sure give you exactly the teams overall number, but that doesn't mean that the distribution among players is correct.

Btw: If we are using the same methodology as they used in the last GSW blog entry, we get 23.0 wins via WS/48 and 22.7 wins via my PRA as the prediction. Both are way closer to the reality than WP48. Not surprising at all. Both are also more consistent from year to year, the same goes for PER. I could calculate the average PER for the teams over time and make a regression to estimate the linear formula WIn% = A*PER+B, but well, I have the feeling it is also closer to the reality anyway.

Post #63 » by **Sleepy51** » Wed Feb 9, 2011 6:27 pm

mysticbb wrote: At this point he is not using a regression to determine the marginal values, he is using league average values. The differentiation gives us:

Well then he is a chowderhead.

Overall, I think I was more focusing on the point that ANY regression analysis model for understanding/predicting individual player impacts on team results would benefit from including differential defensive stats rather than boxscore team defensive stats. I would argue that there is a qualitative difference between how those two data sets can function in any model due to "how basketball works" issues.

mysticbb · Post #64 » by **mysticbb** » Wed Feb 9, 2011 7:37 pm

Sleepy51 wrote:Well then he is a chowderhead.

Well, here is how you can calculate the marginal value for points scored (MVP) by using this seasons data:

MVP = A/PE

where A is 3.442 and PE = FGA + 0.47* FTA + TO - ORB

The values for FGA, FTA, TO and ORB are per game numbers for the league average, but we can use the total average numbers right now and multiply that with the amount of games. Currently an average team has 4160 FGA, 1267 FTA, 738 TO and 561 ORB. They played 51 games. That gives us:

MVP = A/PE*51

MVP = 3.442 / (4160 + 0.47*1267 + 738 - 561) *51

MVP = 0.036

He gets 0.032, because he is using a different data set. That's all.

What Berri NEVER showed is that the formulas can be used to evaluate players. He showed that it is doing a good job at "predicting" wins in hindsight, but that's all. But it is not doing a better job at this as scoring margin. Which isn't a suprise at all, because that's what he is trying to reproduce.

Post #65 » by **floppymoose** » Wed Feb 9, 2011 8:11 pm

Paydro70 wrote: The notion that 5 Tyson Chandlers (or his small-man equivalent) would win 80 games just reflects a complete lack of understanding of how basketball works.

In fairness to WP, this will be true of any stat whose metric is some form of win shares. These win shares are situational.

If you look at the 5 best centers in the league they should be generating tons of wins. That doesn't mean that playing all 5 of them on a team at the same time is a winning strategy, and that fact doesn't in itself make the metric bad. The metric is about how much are these players helping you *in the context of their team, minutes, player combinations, etc*.

If we could somehow fix WP to assign credit more accurately, it would still fail the test you apply above, because that's not really a legitimate test.

Post #66 » by **floppymoose** » Thu Feb 10, 2011 12:18 am

Thought experiment for WP, part I:

take the box score and strip out all the info except minutes, points and FGA, and then do the same regressions and team adjustments that WP does to tune the results.

How well can this be made to correlate with wins in past seasons?

part II:

Now do the same thing again, but take out FGA as well.

Mostly I'm wondering if you can get an impressive correlation with wins this way. It would no longer amount to counting up possessions and then factoring in defensive and offensive efficiency, because we've stripped too much data out to achieve that. But it might correlate pretty well anyway, which would be interesting if true.

I'm thinking that counting up the points on the team, plus having the "team defensive rating" would basically be like having scoring differential, which is known to correlate well with wins.

mysticbb · Post #67 » by **mysticbb** » Thu Feb 10, 2011 12:34 am

Used data set from databasebasketball.com from 1979/80 to 2009/10

Part I: Points and FGA pace and minutes adjusted + defensive adjustment:

Code: Select all

               Model Summary(b)
                     Change Statistics
Model   R   R Square   Adjusted R Square   Std. Error of the Estimate
1   ,970a     ,941        ,941       ,0377461   
a. Predictors: (Constant), DEF_A, FGA_P, PTS_P
b. Dependent Variable: Win%

Win% = -2.77-0.002*FGA_P+0.155*PTS_P+0.031*DEF_A

Part II: Only points adjusted for pace and minutes + defensive adjustment term:

Code: Select all

               Model Summary(b)
                     Change Statistics
Model   R   R Square   Adjusted R Square   Std. Error of the Estimate
1     ,970a      ,941      ,941           ,0377361   
a. Predictors: (Constant), DEF_A, PTS_P
b. Dependent Variable: Win%

Win% = -2.805+0.155*PTS_P+0.031*DEF_A

Well, the DEF_A is a small adjustment that will not change the ranking of the players much ....

Took me 10 minutes to build TWO models which have a higher correlation to winning than WP.

Post #68 » by **floppymoose** » Thu Feb 10, 2011 12:38 am

nice. do you know how that compares to regular WP?

And the other part fo this I wonder about is how the DEF_A is calculated. If it was tuned in some way to make WP the best it could be, perhaps it needs to be retuned in both of the above examples.

mysticbb · Post #69 » by **mysticbb** » Thu Feb 10, 2011 12:41 am

floppymoose wrote:I'm thinking that counting up the points on the team, plus having the "team defensive rating" would basically be like having scoring differential, which is known to correlate well with wins.

Correct.

floppymoose wrote:nice. do you know how that compares to regular WP?

Uh, I had a spreadsheet for WP, but I deleted it, because it wasn't worth much anyway. I probably could make a new one and compare the results for players. But well, I doubt I make it today. If anyone has the desire, the model is there.

floppymoose wrote:And the other part fo this I wonder about is how the DEF_A is calculated. If it was tuned in some way to make WP the best it could be, perhaps it needs to be retuned in both of the above examples.

Well, I took 106.5-DRTG = DEF_A, which means I just compared the team defensive rating to the average offensive/defensive rating from 1979 to 2010.

Post #70 » by **floppymoose** » Thu Feb 10, 2011 12:49 am

mysticbb wrote:Took me 10 minutes to build TWO models which have a higher correlation to winning than WP.

So this part makes me think that while you have deleted your WP spreadsheet, you remember the results well enough to claim that this correlates better?

Post #71 » by **floppymoose** » Thu Feb 10, 2011 12:51 am

And it also doesn't look like you used minutes? I ask because I'm wondering about an apples to apples comparison with WP48.

mysticbb · Post #72 » by **mysticbb** » Thu Feb 10, 2011 1:01 am

floppymoose wrote:And it also doesn't look like you used minutes? I ask because I'm wondering about an apples to apples comparison with WP48.

Well, didn't saw the minutes part, but it will not change much anyway, because I controll the model basically via ORtg and DRtg, thus the minutes and FGA will get skipped. And if you want to go by minutes, you calculate the win%, which means that is per game (or basically per 48 minutes anyway).

But here we go with minutes:

Code: Select all

      Model Summary
Model   R   R Square   Adjusted R Square   Std. Error of the Estimate
1   ,970a   ,941   ,941   ,0377689
a. Predictors: (Constant), FGA_P, MIN, PTS_P, DEF_A

Code: Select all

         Coefficients(a)
      Unstandardized Coefficients      Standardized Coefficients
Model      B   Std. Error   Beta   t   Sig.
1   (Constant)   -2,775   ,394      -7,042   ,000
   MIN   2,634E-7   ,000   ,000   ,013   ,989
   PTS_P   ,155   ,002   ,751   88,512   ,000
   DEF_A   ,031   ,000   ,695   81,687   ,000
   FGA_P   -,002   ,003   -,006   -,749   ,454
a. Dependent Variable: Win%

I have the correlation coefficients in my memory. 0.95 reported Berri (I got 0.93 with his model), and for the model without the team adjustment it was 0.72.

Post #73 » by **floppymoose** » Thu Feb 10, 2011 1:15 am

thanks mytic

mysticbb · Post #74 » by **mysticbb** » Thu Feb 10, 2011 1:42 am

Here is the current TOP of the FM-guys:

Code: Select all

      Player       Age  Tm   Q FM48  FM
Kobe Bryant          32 LAL  1 0.676 24.2
LeBron James         26 MIA  1 0.594 22.7
Kevin Durant         22 OKC  1 0.592 22.6
Dwyane Wade          29 MIA  1 0.593 20.9
Derrick Rose         22 CHI  1 0.525 19.9
Kevin Martin         27 HOU  1 0.591 19.3
Amare Stoudemire     28 NYK  1 0.501 19.0
Dwight Howard        25 ORL  1 0.453 16.8
Dirk Nowitzki        32 DAL  1 0.567 16.7
Russell Westbrook    22 OKC  1 0.427 16.0
Monta Ellis          25 GSW  1 0.375 15.8
Carmelo Anthony      26 DEN  1 0.484 15.6
Blake Griffin        21 LAC  1 0.404 15.5
Eric Gordon          22 LAC  1 0.450 14.5
LaMarcus Aldridge    25 POR  1 0.343 14.3
Deron Williams       26 UTA  1 0.375 14.1
David West           30 NOH  1 0.368 13.6

I called it FM48 and FM in honor to floppymoose. :)

Post #75 » by **floppymoose** » Thu Feb 10, 2011 1:43 am

Kevin Martin means FM sucks as a player ranker. :-D

mysticbb · Post #76 » by **mysticbb** » Thu Feb 10, 2011 1:54 am

floppymoose wrote:Kevin Martin means FM sucks as a player ranker. :-D

Well, it is the ranking of the biggest volume scorers in the NBA. It is doing pretty much a perfect job at this. :)

But overall it just shows that you can control the correlation via a defensive adjustment and can even get a "good" ranking out of this without using more informations than minutes, pace, points and DRtg. 0.97 correlation coefficient to winning, why should we questioning "our" model?

ElGee · Post #77 » by **ElGee** » Thu Feb 10, 2011 1:58 am

mysticbb wrote:Here is the current TOP of the FM-guys:

Code: Select all
Player Age Tm Q FM48 FM Kobe Bryant 32 LAL 1 0.676 24.2 LeBron James 26 MIA 1 0.594 22.7 Kevin Durant 22 OKC 1 0.592 22.6 Dwyane Wade 29 MIA 1 0.593 20.9 Derrick Rose 22 CHI 1 0.525 19.9 Kevin Martin 27 HOU 1 0.591 19.3 Amare Stoudemire 28 NYK 1 0.501 19.0 Dwight Howard 25 ORL 1 0.453 16.8 Dirk Nowitzki 32 DAL 1 0.567 16.7 Russell Westbrook 22 OKC 1 0.427 16.0 Monta Ellis 25 GSW 1 0.375 15.8 Carmelo Anthony 26 DEN 1 0.484 15.6 Blake Griffin 21 LAC 1 0.404 15.5 Eric Gordon 22 LAC 1 0.450 14.5 LaMarcus Aldridge 25 POR 1 0.343 14.3 Deron Williams 26 UTA 1 0.375 14.1 David West 30 NOH 1 0.368 13.6

I called it FM48 and FM in honor to floppymoose.

LOL. FM48 destroys WP48.

Kobe, LBJ, Durant, Wade and Rose vs.
Love, Paul, Howard, LBJ, Randolph.

Kevin Martin is Floppy's biggest outlier. WP is, um, Kris Humphries.

Post #78 » by **floppymoose** » Thu Feb 10, 2011 1:58 am

mysticbb wrote:But overall it just shows that you can control the correlation via a defensive adjustment and can even get a "good" ranking out of this without using more informations than minutes, pace, points and DRtg. 0.97 correlation coefficient to winning, why should we questioning "our" model?

Indeed. That was what I suspected, and why I proposed the "thought experiment".

Which you turned into an actual test, and verified my suspicions. You get the awesome award!

Post #79 » by **Doctor MJ** » Thu Feb 10, 2011 5:41 am

You guys rock man. Every time I thought of WP it made me angry, but now I'm just going to think of the great FM index. If only Berri could get more Floppy, he might be able to come up something good.

Idunkon1stdates · Post #80 » by **Idunkon1stdates** » Thu Feb 10, 2011 6:11 am

Did anyone notice that dberri replied? In his response, he took a shot at floppymoose's username, and then said correlation to wins isn't everything Wins Produced is about -- in fact, other factors went into producing it (which he doesn't mention, but he is referring to the regressions).

He has since deleted his reply, probably because he realized how passive-aggressive it sounded and also because most of his drones follow his model because, as he likes to repeat every other day, it explains 90 - 95% of wins. Clearly, FM48 explaing 97% of wins threatens his cred. Better to "take the high road" and ignore the riff-raff so the Wages of Wins myth can continue to be perpetuated.