Joel Anthony- MVP.

laika · Post #21 » by **laika** » Thu May 19, 2011 8:09 pm

At the risk of sounding too confrontational, I think you are simply wrong. Postseason play is significantly different than regular season play.

-In the regular season the games don't count for much. Teams rarely give full effort for long stretches.
-The quality of opposition is significantly higher in the playoffs.
-You play multiple games against one team, which allows weaknesses to be exploited much more effectively.

So let's look at some numbers for the Heat this year.

Regular season adj +/-
James, LeBron___ 9.63
Bosh, Chris______ 8.93
Wade, Dwyane___ 7.47
Jones, James____ 1.23
Chalmers, Mario_ -0.55
House, Eddie____ -1.70
Ilgauskas, Zyd___ -7.13
Miller, Mike_____ -7.74
Anthony, Joel____ -8.12
Dampier, Erick__ -10.44

Post season +/-(no one seems to be willing to post adj numbers)
Anthony, Joel___ 35.68
Jones, James____ 14.71
Chalmers, Mario_ 10.55
Wade, Dwyane____ 6.21
Bosh, Chris______ -0.16
James, LeBron___ -4.48
Bibby, Mike_____ -15.99
Ilgauskas, Zyd___ -48.91

According to your theory of "regular season predicts the postseason" these numbers should be completely impossible. The rational conclusion from these numbers is that Lebron is shockingly overrated or that +/- is essentially useless.

Post #22 » by **Doctor MJ** » Thu May 19, 2011 9:08 pm

laika wrote:At the risk of sounding too confrontational, I think you are simply wrong. Postseason play is significantly different than regular season play.

-In the regular season the games don't count for much. Teams rarely give full effort for long stretches.
-The quality of opposition is significantly higher in the playoffs.
-You play multiple games against one team, which allows weaknesses to be exploited much more effectively.

So let's look at some numbers for the Heat this year.

Regular season adj +/-
James, LeBron___ 9.63
Bosh, Chris______ 8.93
Wade, Dwyane___ 7.47
Jones, James____ 1.23
Chalmers, Mario_ -0.55
House, Eddie____ -1.70
Ilgauskas, Zyd___ -7.13
Miller, Mike_____ -7.74
Anthony, Joel____ -8.12
Dampier, Erick__ -10.44

Post season adj +/-
Anthony, Joel___ 35.68
Jones, James____ 14.71
Chalmers, Mario_ 10.55
Wade, Dwyane____ 6.21
Bosh, Chris______ -0.16
James, LeBron___ -4.48
Bibby, Mike_____ -15.99
Ilgauskas, Zyd___ -48.91

According to your theory of "regular season predicts the postseason" these numbers should be completely impossible. The rational conclusion from these numbers is that Lebron is shockingly overrated or that +/- is essentially useless.

I've got no objection to being told I'm wrong, but I'd prefer you to really try to understand where I'm coming from before labeling me as such, and based on your current post, I don't think you're there yet.

Your reasons for why the post-season is different from the regular season are fine, but it's a question of scale. As I mentioned with Olajuwon, if someone goes from a 24 PER guy to a 26 PER guy in the post-season, that's HUGE by playoff improvement standards. And yet, we're still talking about a different of only 2 PER points. To say that we can learn nothing of Olajuwon's post-season play from his regular season play, is essentially to look at approximations with extremely high correlation as utterly worthless because they are not perfect. If one does that, then one might as well abandon statistics all together - perfection through statistic is unattainable.

Your listing of the Heat's +/- numbers just seems indicative of convoluted analytical thinking. Let's take this one step at a time:

1) I believe that you can tell a lot about how someone plays in the post-season based on how they play in the regular season.

2) As proof I cite a stat with a high degree of reliability to show that even in the outlier cases, the change from regular to post-season is not that dramatic.

3) Therefore, if I have a stat that can be used to measure general performance with (which is mostly by sample from the regular season), I can use that as a baseline for understanding a particular player and expect it to not be too far off from what that player does in the post-season.

4) Of course this doesn't mean that that general performance metric is my be all end all assessment of a player. I call it a baseline because it's a starting point.

The fact that +/- itself becomes noisy with low sample size doesn't impact any of those steps in my logic.

I'll add two more notes:

1) The fact that +/- gets noisy with low sample size does not make it unique. The same thing is true for any stat. Granted +/- is more prone to this than some other stats. Resistance to these issues is called reliability and it is certainly something you want in a stat, however it is just as certainly not everything.

2) In trying to get in your head to understand why you are arguing this point, it seems to me that maybe you're thinking that when I use a stat, it becomes my definitive statement about how good a player is. So I'd be sold on +/- generally, and thus insist on clinging to it above all others things at all times.

This is not how I work, nor should it be how anyone works. I look at all the tools available to me, and use each where it can be useful, weighting each based on how credible it is in those circumstances. My opinion at any point in time is thus formed by a combination of factors, and I very rarely will reject anything completely. In fact, the mindset where one looks for a reason why something should not ever be used is one that I think is flawed from the start.

Every tool is flawed, but all of them were created by smart people who saw it did something for them. The only good reason not to use a tool then is if you have something else that does the same thing only better. And I'll tell you write now: While we can debate about whether a particular all-in-one box score stats are inherently superior others, there is nothing that can claim to be an inherently superior tool to +/-. It provides a perspective that you simply cannot get through box score stats, and thus any analyst that isn't using both types of metrics is missing insight.

Post #23 » by **mopper8** » Tue May 24, 2011 10:04 pm

Just to throw it out there, I think its reasonable to suggest that in slow-it-down, grind-it-out defensive battles, Joel Anthony's value sky-rockets. I think its doubly-reasonable to suggest that his style of play is much less valuable when the players around him are playing with less energy/urgency and more sloppy, as is often the case in the regular season. I'd also add that its perfectly reasonable to suggest that Joel's value would improve even more given the specific adjustments to the defense Miami has made in the postseason.

To un-pack that a bit:

-Joel's greatest weakness is his trouble catching and finishing around the rim, so teams feel free to leave him defensively to double up on one of the big 3. In the slower-paced playoffs, though, when teams more routinely work deep into the shot-clock, the need to use Anthony as an outlet for the offense shrinks. The Heat are able to generate good looks with ball movement for other players even if the other team is leaving Anthony. what's more, given the extensive amount of film work that goes into playoff preparation, Miami is at times able to use that lack of defensive attention to their advantage. Vs Boston, for example, when Glen Davis left Joel (who was on the weakside low block) to overload the strong side against a Wade drive, Lebron was able to cut baseline and receive a back-screen from Joel on Paul Pierce...and nobody was there to help Pierce, because he was all alone. End result was an alley-oop for Lebron. Heat never had anything like that in their arsenal in the regular season. Meanwhile, his quickness means he's able to work deep into the shot-clock defensively without ever falling behind in the rotations, still able to meet defenders in time to prevent or bother a shot no matter how often teams get the ball going side-to-side.

- He leaves his player a lot of defense. That's by design in the Heat's system. That requires other players to cover his man on rotations. In the regular season, when he's going all out to come out on someone else's man, if the rest of the team isn't playing with the same energy, his man is more likely to come free, or someone else's man is more likely to come from on the next rotation. When everyone is playing all out "on a string," his help defense is no longer a liability.

- The Heat have been far more aggressive defensively in the playoffs. They never did more than put a strong show on PnR plays in the regular season and generally used base defense; in the playoffs, they've done more than hedge: they've trapped the ball-handler, switched, doubled players off-the-ball (Ray Allen in particular). A quick, mobile big who can recover to the lane in one long stride is more valuable playing that type of defense than he is in less aggressive defense.

Which is to say, nobody should be shocked that Joel's +/- increased in the playoffs. It's not a fluke, and represents less him "stepping up" in play nor a flaw in a stat than it does reflect the context shifting in a way where his play is more valuable.

Post #24 » by **rrravenred** » Tue May 24, 2011 10:26 pm

Out of interest, is there any source for shot-clock usage in the playoffs vs the regular season? I'd love to unpack that "deep into the shotclock" statement a little...

Post #25 » by **mopper8** » Tue May 24, 2011 10:58 pm

rrravenred wrote:Out of interest, is there any source for shot-clock usage in the playoffs vs the regular season? I'd love to unpack that "deep into the shotclock" statement a little...

That's a good question, it seems obvious on its face to me, but maybe not. Pace certainly drops significantly in the playoffs (3 of the remaining 4 teams have played at an average playoff pace below the slowest in the league in the regular season) but maybe there are just fewer transition opportunities, but half-court offense stays roughly the same and I'm seeing something not quite there.

Average playoff pace this year is 87.8 so far and league low Portland played at 87.9 for the regular season. Heh.

Post #26 » by **floppymoose** » Wed Jun 1, 2011 8:04 pm

mysticbb wrote:There are people who are using ridge regression to adjust for such problems. That should relieve you from all the concerns you have.

You shouldn't really feel relieved until you understand what ridge regression does and have an intuition for how it is working in adjusted plus minus. Which I don't, yet.

DSMok1 · Post #27 » by **DSMok1** » Thu Jun 2, 2011 2:30 pm

floppymoose wrote:
mysticbb wrote:There are people who are using ridge regression to adjust for such problems. That should relieve you from all the concerns you have.

You shouldn't really feel relieved until you understand what ridge regression does and have an intuition for how it is working in adjusted plus minus. Which I don't, yet.

Ridge regression is basically just adding in a set number of possessions of league-average results for each player within the regression itself, having the effect of both regressing the results to the mean and disentangling the collinearity of regular APM.

Normally in APM, each equation looks like this:
P1 + P2 + P3 ... - P9 - P10 + HCA = Eff.Dif. (with weight = N Poss.)

Ridge regression keeps all of those equations, and adds a bunch like this:
P1 = 0 (with weight = X Poss.)

Then the regression is run, and the number X is tuned so that the best out of sample results are found (10 fold cross-validation is often used, where 90% is used in the regression and compared to 10% remaining for accuracy, and repeated for each fold).

Basically, it's regression to the mean and disentangling of collinearity rolled together.

Post #28 » by **Doctor MJ** » Thu Jun 2, 2011 3:53 pm

First I want to say, thanks DSMok for starting to break RAPM down.

I know about the cross-validation, and this is a clear cut improvement.

With the ridge regression though, still thinking. I can believe that essentially reducing the extremity of the results reduces volatility with low sample size.

I don't really see though how this fundamentally addresses the collinearity problem. I'll grant that collinearity is part of the cause of the volatility, and so something that helps reduce volatility could be said to indirectly help with collinearity, but I've heard people talk as is the collinearity issue is solved with RAPM. While that always seemed like it had to be hyperbole to me, I did think there was more to it than adding in fake data.

Appreciate any additional clarification you can give on the subject.

Post #29 » by **Doctor MJ** » Thu Jun 2, 2011 4:08 pm

I'll append:

The glaring flaw in +/- is reliability. There's a lot of noise in the signal, which means you cannot use it the same way you used more traditional box score stats when looking at small sample size. Regularization clearly helps this.

Presumably though, with large enough sample size the reliability edge it gives becomes minimal, and the addition of fake data itself becomes noise, no? So when we talk about many year samples, do we have reason to believe RAPM is still an improvement over raw APM?

DSMok1 · Post #30 » by **DSMok1** » Thu Jun 2, 2011 7:11 pm

Doctor MJ wrote:I don't really see though how this fundamentally addresses the collinearity problem. I'll grant that collinearity is part of the cause of the volatility, and so something that helps reduce volatility could be said to indirectly help with collinearity, but I've heard people talk as is the collinearity issue is solved with RAPM. While that always seemed like it had to be hyperbole to me, I did think there was more to it than adding in fake data.

Appreciate any additional clarification you can give on the subject.

Imagine this simple case:

Player A and player B are on the court together, almost always.

However, A is on the bench for 40 minutes during the year when B is on the court, and in that small sample size, the team does extremely well.

Mathematically:
A+B = +0 (sample of 1200 minutes)
B = +15 (sample of 40 minutes).

What does that mean to the APM regression? B=+15, A=-15. And there is no residual; it's an exact solution. So mathematically, it's robust--but it doesn't make sense. Logically, we know that if A+B = 0 for a ton of minutes, probably the two players are fairly similar in quality. And cross-validation shows us the way to determine how much of this "regression to the mean" we should apply.

In this instance, suppose we have determined that our lambda component, our regression to the mean, should be 120 possessions of league average per player. Then, adding that into the regression, the result comes out: A = -2, B = +2.25.

That result makes more sense intuitively and works out MUCH better in out of sample testing.

DSMok1 · Post #31 » by **DSMok1** » Thu Jun 2, 2011 7:15 pm

Doctor MJ wrote:I'll append:

The glaring flaw in +/- is reliability. There's a lot of noise in the signal, which means you cannot use it the same way you used more traditional box score stats when looking at small sample size. Regularization clearly helps this.

Presumably though, with large enough sample size the reliability edge it gives becomes minimal, and the addition of fake data itself becomes noise, no? So when we talk about many year samples, do we have reason to believe RAPM is still an improvement over raw APM?

As the sample size goes up, yes, the RAPM improvement decreases. Check out this chart: http://sonicscentral.com/apbrmetrics/vi ... 41141#p682

Also, did you read my review of APM and the state-of-the-art? http://godismyjudgeok.com/DStats/2011/n ... ilization/

Post #32 » by **floppymoose** » Fri Jun 3, 2011 12:04 am

DSMok1 wrote:Ridge regression is basically just adding in a set number of possessions of league-average results for each player within the regression itself, having the effect of both regressing the results to the mean and disentangling the collinearity of regular APM.

Normally in APM, each equation looks like this:
P1 + P2 + P3 ... - P9 - P10 + HCA = Eff.Dif. (with weight = N Poss.)

Ridge regression keeps all of those equations, and adds a bunch like this:
P1 = 0 (with weight = X Poss.)

Great explanation. I understand it now.

DSMok1 wrote:Then the regression is run, and the number X is tuned so that the best out of sample results are found (10 fold cross-validation is often used, where 90% is used in the regression and compared to 10% remaining for accuracy, and repeated for each fold).

I was going to ask for further detail on this, since I've heard k-fold cross validation mentioned before without understanding what it meant. But now that I read your comment again, it's perfectly clear. Thanks.

So I've heard RAPM criticized before for assuming that all low minute players are average. I'm guessing this is because actual possessions "N" for these players is drawfed by the ridge regression possessions, "X".

I wonder what happens if instead you don't regress to the mean, but instead make some estimate of the overall quality of low minute players through some other method, and then regress everyone against that?

Post #33 » by **Paydro70** » Fri Jun 3, 2011 4:33 am

Ideally you could have some estimate of replacement level... but that would be a project in itself, to try to determine the typical quality of a low-minute player or late-season replacement/D-league call-up type. That's sort of always been the challenge of adjusted +/- though, isn't it?

DSMok1 · Post #34 » by **DSMok1** » Fri Jun 3, 2011 1:08 pm

floppymoose wrote:So I've heard RAPM criticized before for assuming that all low minute players are average. I'm guessing this is because actual possessions "N" for these players is dwarfed by the ridge regression possessions, "X".

I wonder what happens if instead you don't regress to the mean, but instead make some estimate of the overall quality of low minute players through some other method, and then regress everyone against that?

That's the next frontier. I'm working on it, as well as several other people. That would be a Bayesian APM with an informed prior, rather than an uninformed prior (which just regresses to mean). It's probably been done before, but not in the public domain as far as I have seen.

Post #35 » by **floppymoose** » Sat Jun 4, 2011 11:15 am

So why not just take some guesses at the "informed prior", with out worrying about whether it's right, and then seeing how the regressions turn out, as judged by k-fold validation, with different values? If you got pretty stable values across a few consecutive seasons, that would be a good sign you were on to something, ya?

DSMok1 · Post #36 » by **DSMok1** » Sat Jun 4, 2011 2:21 pm

floppymoose wrote:So why not just take some guesses at the "informed prior", with out worrying about whether it's right, and then seeing how the regressions turn out, as judged by k-fold validation, with different values? If you got pretty stable values across a few consecutive seasons, that would be a good sign you were on to something, ya?

What would you inform it with? My idea is to use team efficiency and (regressed) Minute per Game. I already use a similar prior to regress my Advanced SPM to provide best out-of-sample ASPM estimates.

The concept is this: a player who plays on a top team and plays 40 MPG would be better than a player on the Cavs who plays 40 MPG. And obviously, a player who plays 40 MPG would be better than one who played 2 MPG! I regress MPG because some players play 2 games at 20 MPG the whole season, while others may play 60 at 14 MPG (and the latter is probably better).

Thus far, for ASPM, using just ReMPG, EffDif, and EffDif*ReMPG, seems to be the best prior (linear relationships, nothing second order).

Post #37 » by **floppymoose** » Sat Jun 4, 2011 7:51 pm

DSMok1 wrote:What would you inform it with?

So what I was trying to say was that it might not need to be informed. What if we start trying values, and discover that there is a pretty smooth curve on how well the regressions fit cross validation plotted against the new "prior". And let's further assume that results across different seasons lead to similar results. Then you have, in essence, discovered the prior experimentally rather than theoretically.

mysticbb · Post #38 » by **mysticbb** » Sun Jun 5, 2011 7:38 am

floppymoose, you would basically add a "residuum" to that, it would turn into an overfitting problem.

Dsmok1 and Paydro are right here, we need to have informations about the replacement level in order to get valid results for the rest of the players. So far using the average value is a good first guess.

Post #39 » by **floppymoose** » Sun Jun 5, 2011 8:01 am

I certainly see how it could be overfitting. That's why I was suggesting trying different seasons and seeing how stable the answer was. If it was pretty stable, that's pretty suggestive.

mysticbb · Post #40 » by **mysticbb** » Sun Jun 5, 2011 12:08 pm

But for that you need a couple of season full of data to make such an experiment. And even then I would argue that the results will likely differ or you might even get absurd contradicting values due to the overfitting problem.

Thus the approach by Dsmok1 seems to be the more reasonable one.

If I look at the players with less than 164 minutes during the season (2 min * 82 = 164), I get in average -3.12. When I look at the correlation I get a 0.53 correlation coefficient for players with above 164 minutes, and only 0.08 for all players below between my SPM value and RAPM from 2006 to 2011. The thing is the correlation coefficient doesn't get much better with increased minutes as the 0.53. It is 0.54 for more than 410 minutes, 0.55 for more than 820 minutes and than it stays constant at 0.55.