RealGM

Posted: **Wed Apr 9, 2014 12:13 am**

This stat claim to separate a player's +/- from every possible lineup so someone like Perkins can't benefit his RAPM data by playing next to Durant. I find myself intrigued by how is that even possible so I looked up explanation on how RAPM data is gathered but I can't find.

Can someone explain exactly how RAPM works, or give a link to full explanation on how they separate someone's RAPM data from rotations and all? I'm really intrigued and skeptical on this stat.

Posted: **Wed Apr 9, 2014 4:16 am**

bbms wrote:This stat claim to separate a player's +/- from every possible lineup so someone like Perkins can't benefit his RAPM data by playing next to Durant. I find myself intrigued by how is that even possible so I looked up explanation on how RAPM data is gathered but I can't find.

Can someone explain exactly how RAPM works, or give a link to full explanation on how they separate someone's RAPM data from rotations and all? I'm really intrigued and skeptical on this stat.

I can't really explain it well myself, but understand it somewhat. There's an explanation herre that is better than anything I can do:
http://www.gotbuckets.com/what-is-apm/

RAPM is toward the bottom.

Posted: **Wed Apr 9, 2014 7:53 am**

bondom34 wrote:I can't really explain it well myself, but understand it somewhat. There's an explanation herre that is better than anything I can do:
http://www.gotbuckets.com/what-is-apm/

RAPM is toward the bottom.

APM explanation looks good, but the site doesn't say much about RAPM and at that the applied method of ridge regression.

The quick and dirty version: RAPM is like APM, but with a penalty applied in order to avoid overfitting and handle multicollinearities better. Ridge regression is proven to give better predictions when we have ill-posed problems (that is mathematically proven, not just based on observation). The basketball data presents an ill-posed problem, because we have way more equations than variables (each game snippet with 5vs.5 in which a possession changes can be seen as one equation, which makes roughly 230000 equations, but there are only about 480 players (variables) in the league.

Some other stuff I wrote about it:
viewtopic.php?p=36931175#p36931175

or here: viewtopic.php?p=33272794#p33272794 following the posts (it gets messy, because I can get easily frustrated with people when they are not accurate, so, hope bbms you don't mind)
Also, regarding the RPM there is this thread: viewtopic.php?f=64&t=1314111
and this one: viewtopic.php?f=6&t=1314113

If you have specific questions, feel free, I will try to answer them to the best of my abilities. (Just as a disclaimer, I'm not a English native speaker)

Posted: **Thu Apr 10, 2014 5:28 pm**

A question I have is how much is the RAPM affected by the differences in opposition faced?

For example ,how much difference is the opposition Wade faces versus LeBron?

How about LeBron versus Durant?

I understand a member of Miami will face easier oppoppents than someone from Philadelphia - but after adjusting for the team someone is on, how much variation is there in opponents faced?

Posted: **Thu Apr 10, 2014 6:50 pm**

DQuinn1575 wrote:A question I have is how much is the RAPM affected by the differences in opposition faced?

For example ,how much difference is the opposition Wade faces versus LeBron?

How about LeBron versus Durant?

I understand a member of Miami will face easier oppoppents than someone from Philadelphia - but after adjusting for the team someone is on, how much variation is there in opponents faced?

It adjusts for both the team the player is on and opponent, so it isn't affected.

Posted: **Thu Apr 10, 2014 8:07 pm**

I understand that.

I'm curious how much the impact of opponents is. If we didn't adjust for opponents how much do the numbers change?

Posted: **Thu Apr 10, 2014 8:10 pm**

DQuinn1575 wrote:I'm curious how much the impact of opponents is. If we didn't adjust for opponents how much do the numbers change?

There is only one equation for each possession in which all players on the court (teammates and opponents) are listed as independent variables. Therefore, the regression does a "adjustment for opponents" per se, that is not something added later or whatever you may think. The numbers without the "adjustment for opponents" do not exist in that fashion although theoretically you could run the regression only with having only the teammates as the independent variables. But what would you want to accomplish with that?

Posted: **Thu Apr 10, 2014 8:26 pm**

Also, just another article from Hickory High today on RPM:
http://www.hickory-high.com/is-espns-re ... -for-real/

Posted: **Thu Apr 10, 2014 8:54 pm**

bondom34 wrote:Also, just another article from Hickory High today on RPM:
http://www.hickory-high.com/is-espns-re ... -for-real/

While I understand the sentiment of the author, he unfortunately has less insight than he believes he has. But that's not per se an issue based on him, but probably based on the information politics of ESPN. Anyway, to correct the biggest mistakes the author makes here:

"The box-score based prior which RPM uses is based on a regression of box-score stats against multiple-year* pure regularized adjusted plus minus (RAPM), which is a pure version of the above mentioned crazy big math equation with ridge regression applied."

That is wrong. The boxscore prior is based on a regression of boxscore values from year-1 season on the year season lineup results. That is done for multiple seasons. The thing the author is talking about is the ASPM metric developed by DSmok_1, that's is not the same as the thing Engelmann uses.

"The guys at Talking Practice have created Individual Player Value (IPV), which is an all in one metric that is very, very similar to RPM in methodology and results, but they report numbers that are not informed by prior years."

It is not "very, very similar" at all. The IPV-guys are using a boxscore-based prior for offense alone and for defense they use a prior based on the previous season data. They also use a machine learning algorithm rather than pure ridge regression like Engelmann. So, those values may give a prediction in the same ballpark, but they are different methods and should not be used to compare the effect of a previous season prior on the data.

" Under IPV, LeBron is third behind Kevin Durant and Stephen Curry. It’s clear that giving LeBron credit for last year gives him an upper hand under RPM"

There is no way to determine such thing by going off of those numbers. The main reason: James has the highest boxscore-based prior of any player in Engelmann's approach, that alone makes a HUGE influence on the result, and it is not unlikely that James would be seen as best as well without the usage of the xRAPM from the previous season. In fact, in my metric, using only this season's data, James ends up ahead of Chris Paul and Kevin Durant as the #1.

"RPM contains a height-based prior which boosts the defensive ratings of all taller players."

He probably took that from Pelton's statement in the podcast, but the height bias is not just in there for the defensive end, but is in fact also in there on offense. Engelmann used height and experience as additional independent variables in his boxscore-based regression. That also tells me overall that the author does not know what he is really talking about.

Also, the prior in itself does not "make up a large portion of the metric", it is simply a point where the regression starts with. The major influence is still the current season play-by-play data. Although the defensive value is shifted through the boxscore-based metric, because in a boxscore a defensive rebound presents the end of a defensive possessions, and is therefore the only real evidence for the defensive success of the team in the boxscore, unless a blocked shot happened in that very possession as well. That's what I critized when Engelmann developed his xRAPM in great length at APBR, but it was ignored. Nonetheless, if someone wants to critize something, it would be good, if he would be informed enough to make an accurate critisism.

Posted: **Fri Apr 11, 2014 4:01 am**

DQuinn1575 wrote:I understand that.

I'm curious how much the impact of opponents is. If we didn't adjust for opponents how much do the numbers change?

There is no way to determine how much the adjustment for opponent quality affects the result, because the only way to allocate lineup +/- to players is to compare the quality of lineups that player was in to the quality of lineups he played against. Without comparing lineup quality and making those adjustments, we have no idea what average (0) is, so there would be no pluses or minuses. There would be no list of lineup equations to solve for every player, so no output would exist.

The big adjustment you are curious about (how much MIA's starters get docked for lesser competition for example), isn't particularly significant. The SRS of Miami's opponents this year is -0.57 below average. If every player contributed equally, the boost they get in RAPM is only 0.11 per player. For better players that might be a bit more, but it's not even a half point.

The big difference if the system could somehow be built ignoring quality of lineups would actually be the differences between good benches and bad starters. Example:

Borderline scrub bench player plays 90% of his time vs. opponent players 8-12. His lineups almost always outscore the other bench by 5 pts. Not adjusted, he might be a +1 player. Adjusted, he's actually a -1 player, because deep bench lineups his team is beating when he's on the court are probably -10 in quality (or -2 per player).

#3 starter plays 90% of his time against other starters or at least 3 starters and players 6 and 7 on the other team. His lineups tend to lose to the opposition by 3 pts per 100 possessions. Un-adjusted he might be a -0.5 player. Adjusted, he might be +0.5.

Unadjusted, a borderline scrub on a good bench might look 1.5 pts better than an average starter on a subpar team when the reality is the bench scrub might be 1.5 pts worse.

Posted: **Fri Apr 11, 2014 4:08 am**

mysticbb wrote:
DQuinn1575 wrote:I'm curious how much the impact of opponents is. If we didn't adjust for opponents how much do the numbers change?

There is only one equation for each possession in which all players on the court (teammates and opponents) are listed as independent variables. Therefore, the regression does a "adjustment for opponents" per se, that is not something added later or whatever you may think. The numbers without the "adjustment for opponents" do not exist in that fashion although theoretically you could run the regression only with having only the teammates as the independent variables. But what would you want to accomplish with that?

I get there is only one equation; although I wont pretend to understand all the math that is in it.
A few things about opponent adjustment, although obviously it will give you a more accurate result:

1. Baseball basically ignores it and assumes things level out - they don't adjust for pitcher faced, defensive lineup faced, and only adjust park at a higher level.

2. It is easier to reverse test the results if opponents are held constant, as I can take the numbers into models it and see how accurate it is.

3. I have to believe once you adjust for team schedules the impact of opponents can't be that great. Within the same team, 2 regulars will see virtually the same average of opponents.
Even comparing Durant to LeBron, there can't be that big a difference in average opponents - I'm assuming once you adjust for schedule the impact can only be .2 to .3

Remember, I said "obviously it will give you a more accurate result"

Posted: **Fri Apr 11, 2014 4:16 am**

DQuinn1575 wrote:
mysticbb wrote:
DQuinn1575 wrote:I'm curious how much the impact of opponents is. If we didn't adjust for opponents how much do the numbers change?

There is only one equation for each possession in which all players on the court (teammates and opponents) are listed as independent variables. Therefore, the regression does a "adjustment for opponents" per se, that is not something added later or whatever you may think. The numbers without the "adjustment for opponents" do not exist in that fashion although theoretically you could run the regression only with having only the teammates as the independent variables. But what would you want to accomplish with that?

I get there is only one equation; although I wont pretend to understand all the math that is in it.
A few things about opponent adjustment, although obviously it will give you a more accurate result:

1. Baseball basically ignores it and assumes things level out - they don't adjust for pitcher faced, defensive lineup faced, and only adjust park at a higher level.

2. It is easier to reverse test the results if opponents are held constant, as I can take the numbers into models it and see how accurate it is.

3. I have to believe once you adjust for team schedules the impact of opponents can't be that great. Within the same team, 2 regulars will see virtually the same average of opponents.
Even comparing Durant to LeBron, there can't be that big a difference in average opponents - I'm assuming once you adjust for schedule the impact can only be .2 to .3

Remember, I said "obviously it will give you a more accurate result"

I'm not sure why, but this quoted me.....

To clarify maybe/hopefully, view this equation for the 2 teams for a given time on court:

A+B+C+D+E = F+G+H+I+J

for 2 teams of 5 players for a given time frame where the lineups are represented by those groups of 5 players. This would be a sequence where the teams score the same amount of points while those lineups are on court. This is ONLY for a time frame where those exact players are on court, when subs come in the equation changes to reflect the new player. This is massively oversimplified obviously, but is just to show the way that the adjustment is made. This is run for each lineup, in each timeframe during each game. There is no way to adjust for opponent, and all players are rendered with equal opponents in the end.

Posted: **Fri Apr 11, 2014 4:51 am**

DQuinn1575 wrote:I get there is only one equation; although I wont pretend to understand all the math that is in it.
A few things about opponent adjustment, although obviously it will give you a more accurate result:

1. Baseball basically ignores it and assumes things level out - they don't adjust for pitcher faced, defensive lineup faced, and only adjust park at a higher level.

2. It is easier to reverse test the results if opponents are held constant, as I can take the numbers into models it and see how accurate it is.

3. I have to believe once you adjust for team schedules the impact of opponents can't be that great. Within the same team, 2 regulars will see virtually the same average of opponents.
Even comparing Durant to LeBron, there can't be that big a difference in average opponents - I'm assuming once you adjust for schedule the impact can only be .2 to .3

Remember, I said "obviously it will give you a more accurate result"

It really isn't one equation though. A team will plays hundreds of lineups a year vs. hundreds of other lineups. If your team outscores another by 5 pts in 20 possessions, one equation would be T1+T2+T3+T4+T5 = O1 +O2+O3+O4+O5 + 25. The 25 is just 5 pts per 20 possessions normalized to 100 possessions. Every single player is given a unique ID and you run an iterative calculation that best fits every conceivable lineup (thousands and thousands) to solve for every player in the league. You can't solve for T1 without having an estimate for every player he is playing with and against.

Two big differences in baseball:

1) There is no mass substitution effect, so you don't get scenarios where bench guys are playing against bench guys and starters vs. starters. Everyone gets a turn at the plate facing the same starting pitcher, middle reliever, closer, etc. There are strategic pitching and batting changes, but generally, pitchers face similar batting quality and batters face similar pitching quality. A utility infielder may get a start and face Verlander roughly much as the superstar (allowing for maybe one fewer PA due to batting order. In basketball, player #13 on a team might get to play in a game vs. the Heat one night, but when he enters the game, it's a 20 pt blowout and the Big Three are sitting on the bench. He isn't playing the "Heat". He's playing the Heat's scrubs.

2) Baseball is pretty much a batter v. pitcher confrontation. Defense doesn't have that big of a hand to play. Your teammates can't help you when it's your turn to bat, with the exception of already having guys on that may dictate pitching strategy in a particular inning with particular scores. Playing with Lebron James will have a much greater effect on a players ability to contribute than playing on a good or bad offensive team will impact a batter's OPS.

Posted: **Fri Apr 11, 2014 7:23 am**

DQuinn1575 wrote:I get there is only one equation;

it is one equation for each possession. That makes roughly 230k equations per season, where the players are the independent variables and the result is the dependent. One equation in normal least square regression (APM) looks like that:

Result = Const+x1P1+x2P2+x3P3+x4P4+x5P5-x6P6-x7P7-x8P8-x9P9-x10P10

P1 to P5 are the home players, while x1 to x5 are their respective coefficients (APM values), P6 to P10 are the away players with x6 to x10 would be their APM values. This equation is only fulfilled, if both groups of players are included, therefore an "adjustment" for the opponents is happening automatically.
The const is the homecourt advantage, this "adjustment" will also happening automatically.

We could solve that iterative like Chicago76 suggests, but I prefer a faster way: transfering that to a matrix algebra problem. Then we can use matrix algebra to solve the issue rather quickly, where the players would just be 1 (home) or -1 (away) in a design matrix, and the results of the possession would be listed in a response vector.
That would look like this:

Code: Select all

β = (X^(T)X)^(-1)X^(T)y

with:

β is the coefficent vector (the APM values for each player)
X is the design matrix
X^T is the transpose design matrix
y is the response vector

Now, the used method is ridge regression, which means a penalty term λ is used, and that's is introducing a bias, but overall that helps keeping the results in check (so to speak). Matter of fact is that it was proven that for an ill-posed problem ridge regression produces a lower error than OLS. And we have an ill-posed problem here, because we have more equations than variables (as I said, roughly 230k equations, but only about 480 variables/players).

Now the ridge looks like that:

Code: Select all

β = (X^(T)X + λI)^(-1)X^(T)y

As you can see there is just a term λI added (that equation is without a prior, that could be added as another vector with selected values for each player or as a term forcing a specific distribution). Anyway, I is the identity matrix (basically just a bunch of zeros with the main diagonal being ones).

The introduced bias would look like this:

Code: Select all

bias(β) = -λUβ

with U = (X^(T)X + λI)^(-1)

When we have the βs, we have the results. For OLS, the βs should be give an average of 0 overall (weighted average), for ridge the result needs to be shifted to the weighted average will be 0. But there is no further "adjustment" made.

DQuinn1575 wrote:1. Baseball basically ignores it and assumes things level out - they don't adjust for pitcher faced, defensive lineup faced, and only adjust park at a higher level.

2. It is easier to reverse test the results if opponents are held constant, as I can take the numbers into models it and see how accurate it is.

3. I have to believe once you adjust for team schedules the impact of opponents can't be that great. Within the same team, 2 regulars will see virtually the same average of opponents.
Even comparing Durant to LeBron, there can't be that big a difference in average opponents - I'm assuming once you adjust for schedule the impact can only be .2 to .3

1. I have no clue about baseball and have no desire going into it. I simply don't care about that game, because it is probably the most boring thing I ever witnessed.

2. That makes zero sense. The results can be tested in out-of-sample with or without such an "adjustment". But given the fact that it makes not much sense to apply the regression just to one part of the players on the court instead of taking all, I have still no idea what you want to accomplish anyway.

3. Again, that makes zero sense. The "adjustments" happening per se within the regression, there are no terms added to the results later or something like that. It seems you are not really grasping what is done in the first place.

Posted: **Fri Apr 11, 2014 7:34 am**

Chicago76 wrote:There is no way to determine how much the adjustment for opponent quality affects the result, because the only way to allocate lineup +/- to players is to compare the quality of lineups that player was in to the quality of lineups he played against. Without comparing lineup quality and making those adjustments, we have no idea what average (0) is, so there would be no pluses or minuses. There would be no list of lineup equations to solve for every player, so no output would exist.

Well, not really. Theoretically you could run the regression just on the equations for:

R = const + x1P1+x2P2+x3P3+x4P4+x5P5

In that way you would just have the lineups of one team in it. Similar to this: http://www.basketball-reference.com/tea ... 4/lineups/

You could make that for each team separately and would get coefficients for each player, while then could compare that to the results with all 10 players on the court. Not quite sure what purpose that would serve, but it is possible.

Posted: **Fri Apr 11, 2014 12:26 pm**

Chicago76 wrote:
DQuinn1575 wrote:I understand that.

I'm curious how much the impact of opponents is. If we didn't adjust for opponents how much do the numbers change?

The big adjustment you are curious about (how much MIA's starters get docked for lesser competition for example), isn't particularly significant. The SRS of Miami's opponents this year is -0.57 below average. If every player contributed equally, the boost they get in RAPM is only 0.11 per player. For better players that might be a bit more, but it's not even a half point.

The big difference if the system could somehow be built ignoring quality of lineups would actually be the differences between good benches and bad starters. Unadjusted, a borderline scrub on a good bench might look 1.5 pts better than an average starter on a subpar team when the reality is the bench scrub might be 1.5 pts worse.

Thanks this is by far the most helpful response to me.

Posted: **Fri Apr 11, 2014 3:17 pm**

mysticbb wrote:
Chicago76 wrote:There is no way to determine how much the adjustment for opponent quality affects the result, because the only way to allocate lineup +/- to players is to compare the quality of lineups that player was in to the quality of lineups he played against. Without comparing lineup quality and making those adjustments, we have no idea what average (0) is, so there would be no pluses or minuses. There would be no list of lineup equations to solve for every player, so no output would exist.

Well, not really. Theoretically you could run the regression just on the equations for:

R = const + x1P1+x2P2+x3P3+x4P4+x5P5

Don't know why I didn't think of this. This would basically convert raw +/- to pace-adjusted and allocated to players without respect to the quality of lineups faced. It would also be very susceptible to over-allocating value to bench players who never see time with or against more than a single starter.

DQuinn1575 wrote:Thanks this is by far the most helpful response to me.

You're welcome. The fundamental difference between adjusting for opponent lineup quality (basketball) and not doing this (baseball) are substitution patterns.

Barring major injuries, baseball lineups are almost always 80%+ starters with a player or two subbed in to give an older player a day off, to cover due to injury, or to take advantage of the opposing pitchers handedness. Lineups are pretty fixed, so it is easy to see what a sub will do vs. starter quality opposition, because subs almost always face starter quality opposition.

In basketball, a decent rotation player will see a bit of time playing with and against lineups that are 80% starters, but on the low end, he'll also see some time playing with and against no starters. If you don't control for the lineup quality range, the +/- could get really wonky due to player substitution patterns.

Posted: **Sun Apr 13, 2014 7:35 am**

Thank you guys for the reading.

I've been tracking some RAPM stats from the last few seasons and the LeBron and Durant case are really bothering me.

LeBron in 2010 had far better RAPM than his 2013 self despite his 2013 season was arguably his best statistical season. Durant in 2010 had almost 1.4 RAPM rating over his 2013 self, but he was definitely much worse. Which can only lead me to my aprioristic conclusion (corroborated by fellow posters around here) that the idea of RAPM, the isolation of one players impact on the game through various lineups, is fundamentally flawed, because there's no such thing as isolating a player's impact.

I guess basketball rotations are a lot more dynamic than any stat can capture.

I don't like RAPM that much. Boosts the "individual value", or individual impact, or RAPM rate based on the player that is working under the superior line up dynamics. The reason why LeBron and Durant fell after 2010 is possibly given due to fundamental changes on dynamics. LeBron went to a mess of a Heat team, and Durant started to play with Perkins and Ibaka instead of far more versatile offensive players Green and Krstic. They kept having good offensive teams around because of bench play.

Posted: **Sun Apr 13, 2014 11:52 am**

bbms wrote:LeBron in 2010 had far better RAPM than his 2013 self despite his 2013 season was arguably his best statistical season. Durant in 2010 had almost 1.4 RAPM rating over his 2013 self, but he was definitely much worse.

I don't think you can compare RAPM numbers from two different years that way. I think you have to read the 2010 and 2013 RAPM numbers relative to that respective year, not as a absolute value, that you can use to compare players across different years.

LeBrons RAPM in '10 being higher than in '13 isn't RAPM saying that he was better in '10 than in '13.

Posted: **Sun Apr 13, 2014 2:23 pm**

Knosh wrote:
bbms wrote:LeBron in 2010 had far better RAPM than his 2013 self despite his 2013 season was arguably his best statistical season. Durant in 2010 had almost 1.4 RAPM rating over his 2013 self, but he was definitely much worse.

I don't think you can compare RAPM numbers from two different years that way. I think you have to read the 2010 and 2013 RAPM numbers relative to that respective year, not as a absolute value, that you can use to compare players across different years.

LeBrons RAPM in '10 being higher than in '13 isn't RAPM saying that he was better in '10 than in '13.

Of course not. Just that his team did better with him on court, which gives a hindsight of how impactful he is and was, but applies to his individual value, the whole outcome of a team sport. I guess I'm still a production + efficiency + skillset vs strategy (eye test) guy.

RealGM

RAPM

RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM

Re: RAPM