Page 1 of 1

A More Granular View of APM

Posted: Sun Apr 8, 2012 5:21 am
by Chicago76
I've been quite busy the past couple of months, so I've been browsing through some old posts. This thread in particular is must-read when thinking about +/-, its limitations, and the nuts and bolts behind it:

viewtopic.php?f=64&t=1164119

That thread, IMO, is must read material, because it has a bit of everything in it:

-discussion of roles
-player interaction
-RAPM and APM
-a cursory look at the nuts and bolts of APM from a matrix algebra, OLS, and ridge regression standpoint (thank you mysticbb).

There was one post in particular that summed something up that I've been thinking about for a while (from mopper8):

In that sense, RAPM is very limited, in that even if we accept its findings, it merely tells you to what extent a guy is effective, but not why he is that effective. And when it comes to building a team, the "why" is just as important as the "how" IMO. And I think in general the complaints in this thread from Bulls fans about the conclusions being drawn is that they ignore the "why," when understanding it would (in theory) shed light on why you might prefer the lower RAPM player even if we accept the idea that in actuality Deng's presence on the court has meant more for improving the Bulls' performance than Rose's has.

Now I believe mopper8's point was directed more toward the concept of role and impact on other players in a general sense, but from a statistical standpoint in the context of RAPM or APM, this is crucial. We may know that the model tells us that player X makes team Y +4 better per 100 on offense, and we know what that means (4 pts per 100 poss), but we don't know why. And the why is often more important than the how much.

My question is: if someone way more well versed in the mechanics of +/- and regression using a giant matrix can determine an estimate of player value in terms of points at the possession level, why can't we also use those same techniques to look at the underlying components that lead to points scored on both ends for players?

Is it an issue of sample size of those events?

A big one:

DRBs and the diminishing returns associated with DRBs, and the relative value of DRBs with positional adjustments has been debated over and over and over again (thank you Mr. Berri). If we have the PBP data to compute DRB rates (and ORB rates for the opposition), then we can measure the strength of both teams. We know the league avg. Why not compare X sub 1 through 5 DRB vs. X sub a through e ORB to determine the value of the players going after those boards? Wouldn't this tell us how to think about the marginal production of a rebounder and possibly the positional aspects (if we notice some good rebounding Gs are having a better impact than interior players with higher DRB rates)?

You could do the same on the ORB side, although I think the redundancy will be lower and the results should be clearer.

Others:
+/- TOV. A high usage PG may have high TOV rates. But what if that reduces TOV rates on a team level?

+/-FG% or eFG. A high usage player (cough, Iverson) may not shoot particularly well, but what if the team eFG or FG increases, controlling for all else? Similarly, consider a great shooting spacer. His efficiency may be a large part of the increase, but what if that didn't explain the entire +/- eFG% variable? What if the residual is him making it that much easier for his teammates to score?

FTA/100: +/- here will in large part be driven by a player who can create, but what if you went a step further? What if you isolated the FT draw of his teammates and realized X% of the increase was due to the player going to the line, but Y% was actually due to him helping his teammates get to the line?

% baskets assisted: a good PG that doesn't create much for himself or post man should increase this figure.

With the correct prior informed data related to FG+FGA/100, efg%/100, TOV/100, etc for players, the data could tell us a lot of things about a player and his impact on teammates. Example: w/ D Rose on the court, the percent of his teammates baskets that are assisted increase from 50% to 70%. Their usage drops 10%, but they post a net +3 pt eFG gain.

I realize that a lot of this may very well just produce a lot of noise, but my general feeling is that if you take a good to very good player, he'll do a couple of things very well. And he'll do everything else okay. Some years, he'll be better than others at a few parts of that everything else. If you look ath the individual components of what goes into that aggregate APM number, then things might also be clearer. If a great shooting perimeter player who does little else goes from +7 on the offensive end to +4 and then back up to +6, we might know why if things are broken down:

+7 thanks to FG effiency on a team level, of which only +5 can be quantified from his scoring vs. a net zero player in that role, so +2 is coming from spacing resulting in better shots/shooting for teammates.
-1 on rebounding on the offensive end thanks to lower team ORB rates from his poor rebounding, staying out at the 3 pt line.
+1 thanks to ability to hit foul shots, space the floor for cutters to receive passes and then get fouled.

The next year, these things might be +5, -1.5, +0.5. We might see the resulting shooting decline to go from +7 to +5 on that point, or we may see that he's shooting the same, as are his teammates, and conclude its just a lot of noise.

Anyone who really gets the mechanics of this care to comment?

Re: A More Granular View of APM

Posted: Sun Apr 8, 2012 7:15 pm
by Doctor MJ
You should check out some of the work done by statistician (and RealGM poster) Evan Z:

http://thecity2.com/2012/02/21/new-play ... ctor-a4pm/

http://thecity2.com/2012/02/22/adjusted ... ard-index/

When I saw someone was doing this again (because there was someone doing it before an NBA team hired him and he took the site down), I was thrilled.

Some leaders among big minute (starter-ish) players:

On offense -
Effective FG%: Nash, Wade, Garnett,
Turnovers: Paul, Jamison, Roy
Rebounds: Love, Griffin, Cousins
Free Throws: Howard, Harden, James

On defense -
Effective FG%: Garnett, Howard, Bogut
Turnovers: G. Wallace, Chalmers, J.R. Smith
Rebounds: Nene, Humphries, Bynum
Free Throws: A. Miller, Duncan, Parker

Re: A More Granular View of APM

Posted: Sun Apr 8, 2012 7:54 pm
by Dr Positivity
The place I'd really love analysis like this taking off, is rebounding. Because I'm becoming convinced RPG is a very misleading stat because it measures how much you get the rebound, but not how much you prevent your man from getting it, and I think there could be a wide gap between players for the latter category that has nothing to do with RPG, just as man to man defense has nothing to do with steals per game, or cornerbacks defensive success in the NFL has little to do with how many passes they intercept. Basically it would seem far more valuable to have a center who "shuts down" the other C's offensive rebounding 10 times out of 10 but gets 3 of his rebounds, than one who grabs 6 of 10 rebounds but allows the opposing C to get 3 in the process

Re: A More Granular View of APM

Posted: Mon Apr 9, 2012 5:33 am
by Chicago76
Doctor MJ wrote:You should check out some of the work done by statistician (and RealGM poster) Evan Z:

http://thecity2.com/2012/02/21/new-play ... ctor-a4pm/

http://thecity2.com/2012/02/22/adjusted ... ard-index/


Thanks. I was looking at this last night after I posted (like I said, trying to catch up a bit). One thing that concerned me on first read turned out not to have anything to do with the individual 4 factors APM (I think) was the team point differential formula he began with had an intercept, which lead to different coefficients depending for both teams. This just seems like overfitting.

I realize that the model might "fit" better that way, but IMO, the intercept intuitively needs to be zero. Otherwise, what it's saying is the PD prediction of two teams w/ identical four factors variables comes down to which one you decide to call "own" vs. which one you decide to call "opp". More than likely, the regression is picking up noise due to the fact that possessions aren't exactly equal in a game.

Once I blocked that out of my mind, the underlying individual stuff seems very interesting. My favorite part was DRBs (addressed indirectly through opp ORB rate). I took the 350 players with the highest possession totals. From there I went ranked them by position tendency 1 to 50, into groups of 70 players. I pulled their DRB rates off b-r...which isn't an exact match given the time difference, but it's close enough. From biggest (center = 5) to smallest (Pg = 1), weighted by possessions:

Pos Range - +/- - DRB
4.6 to 5.0 - +.06 - 20.89
3.9 to 4.5 - +.02 - 18.98
2.7 to 3.8 - +.01 - 14.47
1.9 to 2.7 - -.04 - 10.92
1.0 to 1.9 - -.02 - 9.48

Seems like as good a case as any that individual DRBs aren't correlated (or at least are really minimally correlated) to +/- and by extension, winning or losing.