A More Granular View of APM
Posted: Sun Apr 8, 2012 5:21 am
I've been quite busy the past couple of months, so I've been browsing through some old posts. This thread in particular is must-read when thinking about +/-, its limitations, and the nuts and bolts behind it:
viewtopic.php?f=64&t=1164119
That thread, IMO, is must read material, because it has a bit of everything in it:
-discussion of roles
-player interaction
-RAPM and APM
-a cursory look at the nuts and bolts of APM from a matrix algebra, OLS, and ridge regression standpoint (thank you mysticbb).
There was one post in particular that summed something up that I've been thinking about for a while (from mopper8):
In that sense, RAPM is very limited, in that even if we accept its findings, it merely tells you to what extent a guy is effective, but not why he is that effective. And when it comes to building a team, the "why" is just as important as the "how" IMO. And I think in general the complaints in this thread from Bulls fans about the conclusions being drawn is that they ignore the "why," when understanding it would (in theory) shed light on why you might prefer the lower RAPM player even if we accept the idea that in actuality Deng's presence on the court has meant more for improving the Bulls' performance than Rose's has.
Now I believe mopper8's point was directed more toward the concept of role and impact on other players in a general sense, but from a statistical standpoint in the context of RAPM or APM, this is crucial. We may know that the model tells us that player X makes team Y +4 better per 100 on offense, and we know what that means (4 pts per 100 poss), but we don't know why. And the why is often more important than the how much.
My question is: if someone way more well versed in the mechanics of +/- and regression using a giant matrix can determine an estimate of player value in terms of points at the possession level, why can't we also use those same techniques to look at the underlying components that lead to points scored on both ends for players?
Is it an issue of sample size of those events?
A big one:
DRBs and the diminishing returns associated with DRBs, and the relative value of DRBs with positional adjustments has been debated over and over and over again (thank you Mr. Berri). If we have the PBP data to compute DRB rates (and ORB rates for the opposition), then we can measure the strength of both teams. We know the league avg. Why not compare X sub 1 through 5 DRB vs. X sub a through e ORB to determine the value of the players going after those boards? Wouldn't this tell us how to think about the marginal production of a rebounder and possibly the positional aspects (if we notice some good rebounding Gs are having a better impact than interior players with higher DRB rates)?
You could do the same on the ORB side, although I think the redundancy will be lower and the results should be clearer.
Others:
+/- TOV. A high usage PG may have high TOV rates. But what if that reduces TOV rates on a team level?
+/-FG% or eFG. A high usage player (cough, Iverson) may not shoot particularly well, but what if the team eFG or FG increases, controlling for all else? Similarly, consider a great shooting spacer. His efficiency may be a large part of the increase, but what if that didn't explain the entire +/- eFG% variable? What if the residual is him making it that much easier for his teammates to score?
FTA/100: +/- here will in large part be driven by a player who can create, but what if you went a step further? What if you isolated the FT draw of his teammates and realized X% of the increase was due to the player going to the line, but Y% was actually due to him helping his teammates get to the line?
% baskets assisted: a good PG that doesn't create much for himself or post man should increase this figure.
With the correct prior informed data related to FG+FGA/100, efg%/100, TOV/100, etc for players, the data could tell us a lot of things about a player and his impact on teammates. Example: w/ D Rose on the court, the percent of his teammates baskets that are assisted increase from 50% to 70%. Their usage drops 10%, but they post a net +3 pt eFG gain.
I realize that a lot of this may very well just produce a lot of noise, but my general feeling is that if you take a good to very good player, he'll do a couple of things very well. And he'll do everything else okay. Some years, he'll be better than others at a few parts of that everything else. If you look ath the individual components of what goes into that aggregate APM number, then things might also be clearer. If a great shooting perimeter player who does little else goes from +7 on the offensive end to +4 and then back up to +6, we might know why if things are broken down:
+7 thanks to FG effiency on a team level, of which only +5 can be quantified from his scoring vs. a net zero player in that role, so +2 is coming from spacing resulting in better shots/shooting for teammates.
-1 on rebounding on the offensive end thanks to lower team ORB rates from his poor rebounding, staying out at the 3 pt line.
+1 thanks to ability to hit foul shots, space the floor for cutters to receive passes and then get fouled.
The next year, these things might be +5, -1.5, +0.5. We might see the resulting shooting decline to go from +7 to +5 on that point, or we may see that he's shooting the same, as are his teammates, and conclude its just a lot of noise.
Anyone who really gets the mechanics of this care to comment?
viewtopic.php?f=64&t=1164119
That thread, IMO, is must read material, because it has a bit of everything in it:
-discussion of roles
-player interaction
-RAPM and APM
-a cursory look at the nuts and bolts of APM from a matrix algebra, OLS, and ridge regression standpoint (thank you mysticbb).
There was one post in particular that summed something up that I've been thinking about for a while (from mopper8):
In that sense, RAPM is very limited, in that even if we accept its findings, it merely tells you to what extent a guy is effective, but not why he is that effective. And when it comes to building a team, the "why" is just as important as the "how" IMO. And I think in general the complaints in this thread from Bulls fans about the conclusions being drawn is that they ignore the "why," when understanding it would (in theory) shed light on why you might prefer the lower RAPM player even if we accept the idea that in actuality Deng's presence on the court has meant more for improving the Bulls' performance than Rose's has.
Now I believe mopper8's point was directed more toward the concept of role and impact on other players in a general sense, but from a statistical standpoint in the context of RAPM or APM, this is crucial. We may know that the model tells us that player X makes team Y +4 better per 100 on offense, and we know what that means (4 pts per 100 poss), but we don't know why. And the why is often more important than the how much.
My question is: if someone way more well versed in the mechanics of +/- and regression using a giant matrix can determine an estimate of player value in terms of points at the possession level, why can't we also use those same techniques to look at the underlying components that lead to points scored on both ends for players?
Is it an issue of sample size of those events?
A big one:
DRBs and the diminishing returns associated with DRBs, and the relative value of DRBs with positional adjustments has been debated over and over and over again (thank you Mr. Berri). If we have the PBP data to compute DRB rates (and ORB rates for the opposition), then we can measure the strength of both teams. We know the league avg. Why not compare X sub 1 through 5 DRB vs. X sub a through e ORB to determine the value of the players going after those boards? Wouldn't this tell us how to think about the marginal production of a rebounder and possibly the positional aspects (if we notice some good rebounding Gs are having a better impact than interior players with higher DRB rates)?
You could do the same on the ORB side, although I think the redundancy will be lower and the results should be clearer.
Others:
+/- TOV. A high usage PG may have high TOV rates. But what if that reduces TOV rates on a team level?
+/-FG% or eFG. A high usage player (cough, Iverson) may not shoot particularly well, but what if the team eFG or FG increases, controlling for all else? Similarly, consider a great shooting spacer. His efficiency may be a large part of the increase, but what if that didn't explain the entire +/- eFG% variable? What if the residual is him making it that much easier for his teammates to score?
FTA/100: +/- here will in large part be driven by a player who can create, but what if you went a step further? What if you isolated the FT draw of his teammates and realized X% of the increase was due to the player going to the line, but Y% was actually due to him helping his teammates get to the line?
% baskets assisted: a good PG that doesn't create much for himself or post man should increase this figure.
With the correct prior informed data related to FG+FGA/100, efg%/100, TOV/100, etc for players, the data could tell us a lot of things about a player and his impact on teammates. Example: w/ D Rose on the court, the percent of his teammates baskets that are assisted increase from 50% to 70%. Their usage drops 10%, but they post a net +3 pt eFG gain.
I realize that a lot of this may very well just produce a lot of noise, but my general feeling is that if you take a good to very good player, he'll do a couple of things very well. And he'll do everything else okay. Some years, he'll be better than others at a few parts of that everything else. If you look ath the individual components of what goes into that aggregate APM number, then things might also be clearer. If a great shooting perimeter player who does little else goes from +7 on the offensive end to +4 and then back up to +6, we might know why if things are broken down:
+7 thanks to FG effiency on a team level, of which only +5 can be quantified from his scoring vs. a net zero player in that role, so +2 is coming from spacing resulting in better shots/shooting for teammates.
-1 on rebounding on the offensive end thanks to lower team ORB rates from his poor rebounding, staying out at the 3 pt line.
+1 thanks to ability to hit foul shots, space the floor for cutters to receive passes and then get fouled.
The next year, these things might be +5, -1.5, +0.5. We might see the resulting shooting decline to go from +7 to +5 on that point, or we may see that he's shooting the same, as are his teammates, and conclude its just a lot of noise.
Anyone who really gets the mechanics of this care to comment?