Getting APM from ORtg and DRtg?

Moderator: Doctor MJ

Ripp
General Manager
Posts: 9,269
And1: 324
Joined: Dec 27, 2009

Re: Getting APM from ORtg and DRtg? 

Post#21 » by Ripp » Fri Sep 17, 2010 9:31 am

^-- Alright, I'm not really sure off the top of my head how to adjust for team-specific weights. If you try the simple trick I suggested of just having a different weight vector of each team, you go from 10 boxscore weights to 300. Given that there are only 450 players in the league at a time, might be a bit of a dicey situation...you want your problem size (in this case, 300) to be much smaller than the number of observations (in this case, 450). Still, I guess somebody could try it out and see how well it works.

(For those reading this and not understanding what I'm saying, Chicago76 rightfully points out that weighted boxscore formulas like SPM and PER are not team specific. A defensive rebound on the Celtics might be more valuable than one on say the Bulls.)
A Tolkienesque strategy war game made by me: http://www.warlords.co
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Getting APM from ORtg and DRtg? 

Post#22 » by Chicago76 » Fri Sep 17, 2010 5:49 pm

Ripp wrote:Yep, you are right...there will be large gap between the values of the PG and his backup (the SF).

But this doesn't necessarily mean that the PG will be overrated. If the PG is playing 24+ minutes a game, that means 50%+ of your dataset tells you that the PG is roughly average. The remaining minutes played by his bad backups don't boost his value...they'll just hurt the scores of whoever his backup is. So this sucks for the SF, but I'm not sure it helps the PG. So yes, players being played out of position are probably likely to be hurt by APM.

If you are still unsatisfied with this answer, there are several ways to deal with this issue. You could incorporate more years worth of data, when presumably the SF was playing at his natural position. You could also introduce some sort of statistical regularization...there are number of tricks you can use.


In a given year, I'm not sure that the credit would be assigned exactly in the manner you describe above. Let's assume again the PG plays 24 minutes a night and the teams Ortg is 120 when he's on the court and 80 when he's not. Let's also assume that there are only 6 other teammates on the roster. They all play at random with him and without him and they all evenly share the backup PG responsibility. The defense on the team is average no matter who is on the floor (avg = 100 for simplicity sake). Net, the team's +/- is exactly zero through the first half of the year.

Whenever the true PG is on the floor, the team will outperform their opponent +20 pts / 100 possession. Whenever a utility man is playing PG, the team will underperform their opponent -20 pts / 100. The other players' non-PG minutes are randomly distributed, so they wash out of the equation as the PG has been identified as the sole determinant of the +/-. The other players non PG +/- is therefore zero. All 6 players share the PG penalty of -20 evenly, however, so their position weighted APM is -3.33, while the PG's is +20. The true PG is essentially getting all of the credit for a roster shortcoming, while all of the other players are carrying some of the penalty of playing PG.

Now let's assume at mid-season there is a Kevin Ollie/Sam Cassell/Anthony Johnson old PG who is willing to come out of retirement or is available off waivers and the team is now + 20 pts / 100 possessions with the new backup playing 24 minutes a night as well. Now it doesn't matter who plays PG, and the 2-5 spots on the court are randomly distributed, so it doesn't appear to be the case that any of these guys is more important than the others too. Suddenly, everyone's +/- is +4. That's a huge difference, and it still is subject to roster limitations. What if the two PGs are really being carried by their teammates? After all, an old out of retirement PG is somehow carrying a positive APM suggests he's much more than a temporary solution based strictly on the results. It doesn't make sense. It's not ridiculous to now assume that either PG could have an APM of -10 if he had average teammates. This would mean that the other 4 guys on the court at a given time are somehow responsible for delivering a 30 pt per 100 poss advantage to get back to the actual team advantage of 20. This tells us it is entirely possible that even the players' non-PG APM of 0 in the first half of the year is carrying a penalty relative to their "true" ability.

I realize you don't think APM is the end all of analysis, but I just thought I'd point out a deficiency using a ridiculously extreme example. There is no statistical way to tease any more information out of the example above. In reality, player time is not randomly distributed, so deficiency described above would not be of this magnitude. Still, APM is dependent upon the team construct to some degree. It tells you who is important given the roster limitations of a given team for that year. It's really good at telling a GM, "Hey, idiot, you need to add another PG to the roster" in the example above. The quality of what it tells you slides a bit when comparing APMs of teammates to determine who is really better, and it slides further when comparing the APMs of two players on different teams. Introducing data from prior years may be useful when examining high degrees of player APM variance year over year, but then an additional level of complexity would need to be introduced via aging curves. Sorry for the rambling message.
Ripp
General Manager
Posts: 9,269
And1: 324
Joined: Dec 27, 2009

Re: Getting APM from ORtg and DRtg? 

Post#23 » by Ripp » Fri Sep 17, 2010 7:47 pm

Chicago76 wrote:
Ripp wrote:Yep, you are right...there will be large gap between the values of the PG and his backup (the SF).

But this doesn't necessarily mean that the PG will be overrated. If the PG is playing 24+ minutes a game, that means 50%+ of your dataset tells you that the PG is roughly average. The remaining minutes played by his bad backups don't boost his value...they'll just hurt the scores of whoever his backup is. So this sucks for the SF, but I'm not sure it helps the PG. So yes, players being played out of position are probably likely to be hurt by APM.

If you are still unsatisfied with this answer, there are several ways to deal with this issue. You could incorporate more years worth of data, when presumably the SF was playing at his natural position. You could also introduce some sort of statistical regularization...there are number of tricks you can use.


In a given year, I'm not sure that the credit would be assigned exactly in the manner you describe above. Let's assume again the PG plays 24 minutes a night and the teams Ortg is 120 when he's on the court and 80 when he's not. Let's also assume that there are only 6 other teammates on the roster. They all play at random with him and without him and they all evenly share the backup PG responsibility. The defense on the team is average no matter who is on the floor (avg = 100 for simplicity sake). Net, the team's +/- is exactly zero through the first half of the year.

Whenever the true PG is on the floor, the team will outperform their opponent +20 pts / 100 possession. Whenever a utility man is playing PG, the team will underperform their opponent -20 pts / 100. The other players' non-PG minutes are randomly distributed, so they wash out of the equation as the PG has been identified as the sole determinant of the +/-. The other players non PG +/- is therefore zero. All 6 players share the PG penalty of -20 evenly, however, so their position weighted APM is -3.33, while the PG's is +20. The true PG is essentially getting all of the credit for a roster shortcoming, while all of the other players are carrying some of the penalty of playing PG.

Interesting. So we've essentially boiled down the problem into the following two lineups:
+20 = TruePG + 4 guys
-20 = one of 5 guys + other 4 guys
By symmetry, the 5 non PGs all will have the same score. So this simplifies to:
+20 = TPG + 4 * C
-20 = 5*C
So C=-4, TPG=36, suggesting (probably falsely) that the TruePG is otherwordly.


Chicago76 wrote:Now let's assume at mid-season there is a Kevin Ollie/Sam Cassell/Anthony Johnson old PG who is willing to come out of retirement or is available off waivers and the team is now + 20 pts / 100 possessions with the new backup playing 24 minutes a night as well. Now it doesn't matter who plays PG, and the 2-5 spots on the court are randomly distributed, so it doesn't appear to be the case that any of these guys is more important than the others too. Suddenly, everyone's +/- is +4.

That's a huge difference, and it still is subject to roster limitations. What if the two PGs are really being carried by their teammates? After all, an old out of retirement PG is somehow carrying a positive APM suggests he's much more than a temporary solution based strictly on the results. It doesn't make sense. It's not ridiculous to now assume that either PG could have an APM of -10 if he had average teammates. This would mean that the other 4 guys on the court at a given time are somehow responsible for delivering a 30 pt per 100 poss advantage to get back to the actual team advantage of 20. This tells us it is entirely possible that even the players' non-PG APM of 0 in the first half of the year is carrying a penalty relative to their "true" ability.

Yep, this is a pretty convincing example.

I realize you don't think APM is the end all of analysis, but I just thought I'd point out a deficiency using a ridiculously extreme example. There is no statistical way to tease any more information out of the example above. In reality, player time is not randomly distributed, so deficiency described above would not be of this magnitude. Still, APM is dependent upon the team construct to some degree. It tells you who is important given the roster limitations of a given team for that year. It's really good at telling a GM, "Hey, idiot, you need to add another PG to the roster" in the example above. The quality of what it tells you slides a bit when comparing APMs of teammates to determine who is really better, and it slides further when comparing the APMs of two players on different teams. Introducing data from prior years may be useful when examining high degrees of player APM variance year over year, but then an additional level of complexity would need to be introduced via aging curves. Sorry for the rambling message.


No, this was a great post, I'm glad you jumped in. I think in situations like the one you describe above, there simply isn't enough information available. So you either need to incorporate prior information (e.g., some sort of statistical regularization technique) or find another data source that can be used to adjust player values.
Also, why are aging curves so important in this case? Wouldn't a simple scheme that downweights past years (exponentially, lets say) be sufficient?
A Tolkienesque strategy war game made by me: http://www.warlords.co
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Getting APM from ORtg and DRtg? 

Post#24 » by Chicago76 » Fri Sep 17, 2010 8:54 pm

Ripp wrote:No, this was a great post, I'm glad you jumped in. I think in situations like the one you describe above, there simply isn't enough information available. So you either need to incorporate prior information (e.g., some sort of statistical regularization technique) or find another data source that can be used to adjust player values.
Also, why are aging curves so important in this case? Wouldn't a simple scheme that downweights past years (exponentially, lets say) be sufficient?


Thanks. This example obviously tries to isolate and dramatize the shortcoming, but even in cases where player rotations are more normal and there isn't an entirely obvious roster shortcoming, APM is subject to smaller (but still strange) swings like this.

Re: aging curves vs. downweighting prior years...it's really down to how much of the nitty gritty someone is will to subject themselves to.

Downweighting has its merits when players are in their "normal" years. There are cases where that may distort things further though when a franchise is in major transition. A decent example might be the late mid to late 80s Celtics. Even if Bird never got hurt, he, DJ, and Parish were all still due for a major decline with McHale maybe holding steady for another couple of years and Reggie Lewis on a major upswing. The core really wouldn't change much, but the old guys would benefit at the expense of someone like a Reggie Lewis if prior years were given some sort of credit without aging adjustments.

I don't know if you're familiar with PECOTA used in baseball analysis. It is essentially a much more detailed version of the projections Hollinger puts out. The jist of it is that the analyst creates similarity scores for players based upon historical players of the same age compiling similar rate stats. Muliple years may also be used to track trends for better similarity. MPG, ORB rate, 3PA/2PA, height, weight (if reliable), TS%, usage, FTA/FGA, eFG%, and assist/FGA are some of the major variables I've used. Each of the variables listed above correlates to some extent to how quickly a player's skills decline. Examples:

-relatively inefficient short guys who shoot the ball a lot while amassing assists decline more quickly (Marbury, Francis, Iverson, Thomas) while their pass first counterparts don't (Stockton and Jackson)
-guys who shoot a lot of threes efficiently decline gradually (Allen and Miller)
-less efficient scoring wings who tend to rely upon getting to the line tend to age poorly (McGrady)
-medium sized guys who are awesome offensive rebounders but offer little else drop of a cliff at age X when their legs fail them.

One of the interesting things that is coming to light now that the straight from high school crowd is aging is age vs. minutes debate. It looks like age matters more than minutes for young guys. So a straight from HS/1 and done with three years in the league at 23 is still due for the same jump in performance as a second year in the league/4 years in college guy (the Jermaine O'Neal effect). On the flip side, the early entrants tend to age based upon minutes down the road rather than age (O'Neal, McGrady, etc.)

If there is an adjusted APM analomaly, giving a small shoot first PG the benefit of the doubt at age 30 by simply giving credit for last year's APM might not be the way to go. But it might work perfectly well for a Mark Jackson type.
Ripp
General Manager
Posts: 9,269
And1: 324
Joined: Dec 27, 2009

Re: Getting APM from ORtg and DRtg? 

Post#25 » by Ripp » Fri Sep 17, 2010 9:57 pm

Chicago76 wrote:Re: aging curves vs. downweighting prior years...it's really down to how much of the nitty gritty someone is will to subject themselves to.

Downweighting has its merits when players are in their "normal" years. There are cases where that may distort things further though when a franchise is in major transition. A decent example might be the late mid to late 80s Celtics. Even if Bird never got hurt, he, DJ, and Parish were all still due for a major decline with McHale maybe holding steady for another couple of years and Reggie Lewis on a major upswing. The core really wouldn't change much, but the old guys would benefit at the expense of someone like a Reggie Lewis if prior years were given some sort of credit without aging adjustments.

Hrm, I'm not sure. Suppose I weight the current year by say 1/2, last year by 1/4, two years ago by 1/8, etc. If I apply a scheme like that to current KG, and make the exponential decay sufficiently rapid, I'll regularize his current decline against his past brilliance. I'm not sure why "normal" year is so important...the main idea is to balance current behavior against career averages. Regressing Player X to past versions of himself, if you will.


I don't know if you're familiar with PECOTA used in baseball analysis. It is essentially a much more detailed version of the projections Hollinger puts out. The jist of it is that the analyst creates similarity scores for players based upon historical players of the same age compiling similar rate stats. Muliple years may also be used to track trends for better similarity. MPG, ORB rate, 3PA/2PA, height, weight (if reliable), TS%, usage, FTA/FGA, eFG%, and assist/FGA are some of the major variables I've used. Each of the variables listed above correlates to some extent to how quickly a player's skills decline. Examples:

-relatively inefficient short guys who shoot the ball a lot while amassing assists decline more quickly (Marbury, Francis, Iverson, Thomas) while their pass first counterparts don't (Stockton and Jackson)
-guys who shoot a lot of threes efficiently decline gradually (Allen and Miller)
-less efficient scoring wings who tend to rely upon getting to the line tend to age poorly (McGrady)
-medium sized guys who are awesome offensive rebounders but offer little else drop of a cliff at age X when their legs fail them.

I'm not really familiar with it. Is it some sort of clustering or PCA tye procedure? I'll google around a bit.

One of the interesting things that is coming to light now that the straight from high school crowd is aging is age vs. minutes debate. It looks like age matters more than minutes for young guys. So a straight from HS/1 and done with three years in the league at 23 is still due for the same jump in performance as a second year in the league/4 years in college guy (the Jermaine O'Neal effect). On the flip side, the early entrants tend to age based upon minutes down the road rather than age (O'Neal, McGrady, etc.)

I think I'm misreading your post, but I don't see the contrast here. Age matters for measuring jumps, yet minutes matter for measuring decline..? I'm a bit confused.

If there is an adjusted APM analomaly, giving a small shoot first PG the benefit of the doubt at age 30 by simply giving credit for last year's APM might not be the way to go. But it might work perfectly well for a Mark Jackson type.

Well, I wouldn't be giving him full credit, it would be partial credit...i'm just placing some weight on performance in past years.
A Tolkienesque strategy war game made by me: http://www.warlords.co
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Getting APM from ORtg and DRtg? 

Post#26 » by Chicago76 » Sun Sep 19, 2010 7:25 am

On the aging curves/similarity score:

Clustering works, but I just use individual curves for each player based upon similarity score. My own method follows.

1-I have 15 years worth of data for all seasons of players of a certain age (say 24).
2-For the variables I'm using to compute a score, I have calculated percentiles in decimal form to rank players of that age in past seasons, for example ORB rate.
3-I compress the percentiles to a range of .1 to .9 by adjusting as .8 x percentile + 0.1. I do this to keep a single variable from causing a 0 similarity score as shown later.
4-The similarity of that variable is calculated as follows for all other players in the age group: 1- square root of (subject player adj. percentile - player X adj. percentile)^2 .... (or 1 minus the absolute value of the difference of their adj. percentiles) More similar players will have a lot of values in the .85 to 1.0 range here.
5-after doing this for all variables, compute the geometric mean of all variables. A player will have a 1.00 similarity score to himself.

Take all players with similarity scores above a selected level and examine their 25 year old performance as a % of their 24 year old performance and weight them to reflect their similarity to the player you have projected. This should give you a pretty good idea of the season the subject player should have next year as a % of his most recent year/years as you can see the curve of similar players who have already been down this road.

You can also use this retrospectively to determine if this year is an outlier. Where this ties into aging and APM using multiple years is to figure out how representative prior years should be. Using your Kevin Garnett example and PER instead. Last three years from ages 31 - 33 were 25.3, 21.2, 19.4. If you did a simple weighted average to give KG credit (3-2-1 weighting last year the most and two years ago the least), the weighted average PER would be 21.0, which he didn't come close to achieving with last year's result of 19.4.

If you knew that similar players experience a 10% decline from 31 to 32 and a 12% decline from 32 to 33, you could adjust prior years before taking the weighted average:

age 31 - PER of 25.3 x .9 x .88 = expected age 33 PER of 20.0
age 32 - PER of 21.2 x .88 = expected age 33 PER of 18.7
The weighted average of these two years w/ KG's actual age 33 PER of 19.4 = 19.3. In other words, last year wasn't a fluke. You might also find that similar players at age 33 experience a further 15% decline at age 34, which would put KG at a 16.5 PER next season--a player clearly on his way down to simply "good starter" status.

The age/minutes thing:

The debate is still out on this one, but guys over at APBR have observed that a guy coming into the league at a young age may have the same jump from 23 to 24 as a 4-year college guy going into his second season in the league. At first glance, I would think that the guy just coming out of his rookie year would have more room to grow, but this doesn't appear to be the case.

On the other side of their careers, the straight from high school and one and doners don't appear to age like other 29 year olds, for example. They have extra minutes on their knees and seem to fade earlier. McGrady, KG, Jermaine O'Neal, and even Bryant seem to show signs of premature aging relative to other players in their late 20s/early 30s who stayed in school longer and played shorter college seasons, thus saving their bodies from additional wear and tear from 19-21.
Ripp
General Manager
Posts: 9,269
And1: 324
Joined: Dec 27, 2009

Re: Getting APM from ORtg and DRtg? 

Post#27 » by Ripp » Sun Sep 19, 2010 8:44 pm

^-- Your approach for (essentially) clustering players seems interesting. Have you seen how it compares to other approaches?

Didn't know about the additional jump, pretty interesting.
A Tolkienesque strategy war game made by me: http://www.warlords.co
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Getting APM from ORtg and DRtg? 

Post#28 » by Chicago76 » Mon Sep 20, 2010 4:46 pm

Ripp wrote:^-- Your approach for (essentially) clustering players seems interesting. Have you seen how it compares to other approaches?


Clustering such as this link can provide a good two dimensional representation of mmultiple dimensions (for each variable used): http://arbitrarian.files.wordpress.com/2008/03/nbaprimenet.pdf

The problem with visual clusters is that the more dimensions you add, the more the clustering is subject to distortion. A good example of this in converting only 3 dimensions to 2 is the old Mercator projection of the earth. The first thing that jumps out when looking at one of these is "Greenland is huge". Things get distorted and you might not be able to tell that the closest neighbor appears elswhere on the cluster diagram due to how things bend. You might not realize Greenland is closer to Russia or Scandinavia for example. Or depending upon how the two dimensional representation is framed, Alaska might look like it's 20,000 miles from Russia even though Sarah Palin can see Russia from her house. The same thing can happen in Cluster diagrams of players.

The other type of system is much better at finding the closest neighbors. It's kind of like a zoomed in map, where your subject player (or city) is in the middle and you can easily see distances of closer neighbors. Like NYC's distance from Boston, Buffalo, DC, Philadelphia, Baltimore, and Toronto without the distortion.
Ripp
General Manager
Posts: 9,269
And1: 324
Joined: Dec 27, 2009

Re: Getting APM from ORtg and DRtg? 

Post#29 » by Ripp » Tue Sep 21, 2010 4:39 am

I was thinking more like K-means clustering, for example...the picture above seems to be interested more in constructing a graph connecting players. These two tasks are similar, but not quite the same.

Anyway, thanks for the link to that blog.
A Tolkienesque strategy war game made by me: http://www.warlords.co

Return to Statistical Analysis