Regularized Scaled Plus Minus (RSPM) - 2020-2021 Season

Moderator: Doctor MJ

User avatar
CptCrunch
Analyst
Posts: 3,057
And1: 2,944
Joined: Jun 30, 2016
 

Regularized Scaled Plus Minus (RSPM) - 2020-2021 Season 

Post#1 » by CptCrunch » Thu Apr 1, 2021 6:49 am

Just a thread to dump this stat. Copied description with a few minor edits.

RSPM on Google Sheets.

See: viewtopic.php?p=89917064#p89917064

CptCrunch wrote:
What is this?

A 'new' BPM family plus/minus type stats calculated from aggregate box-score stats. The name of the metric is called:
Regularized Scaled Plus Minus (RSPM).

What this is not?

This is largely unrelated to pure RAPM, bastardized versions of RAPM floating around and ESPN's RPM. Those statistics are calculated on possession level data. Why do I say largely? Because BPM is calibrated to RAPM, and this metric is calibrated to BPM. so in reality RSPM will be correlated to RAPM to a degree.

How is this calculated/how is it different from BPM?

You can read about BPM in detail on Basketball-Reference's site: https://www.basketball-reference.com/about/bpm2.html

You can think of BPM as basically position specific regression model on which the box-score stats are multiplied by their tuned coefficient to arrive at the final BPM output.

The first major difference is that instead of operating on raw box scores, RSPM operates on scaled (or normalized box-score values) for each position. Instead of using your raw assist per game, a player assist be game score would be (scaled_assist per game = (assist per game) - (mean assist per game for his position) / (standard deviation of assist per game for his position)). Scaling reduces range, allows for intuitive comparisons via Z-scores, and helps to quantify outlineariness. For example, Nikoka Jokic is the most outlier passer for his position at 5.9, which is astronomical. Followed by Harden, Draymond, etc. In the converse case, it also allows to look at players bad at certain things, such as point guards who can't rack up assist.

The second main difference is that the RSPM values are regularized. This form of regularization is similar in nature to the regularization of RAPM/RPM. The statistical intuition behind regularization is that you are increasing bias while reducing variance. What this means in practice is that we are stabilizing variance while making our estimates slightly biased (1). What stabilizing variance in this context means is that players who are extreme outliers in BPM are neither as good nor as bad as their BPM makes them out to be.
An extremely good high BPM player who is probably good is Jokic (BPM = 12.1, RSPM = 11.7). An extremely bad (or probably not that good player) with high BPM is Rayjon Tucker (BPM = 30.1, RSPM = -0.842).

Actual details of calculation

1. For each player, scale his {"orbPerGame", "drbPerGame", "stlPerGame","astPerGame", "blkPerGame", "tovPerGame", "pfPerGame", "ptsPerGame", "ftmPerGame", "ftaPerGame", "fgmPerGame", "fgaPerGame", "fg3aPerGame", "fg3mPerGame"} by subtracting the mean and dividing by the sample standard deviation of his position. These stats are minute asjuted.

2. Fit a regularized ridge regression to the benchmark BPM values with regularization lambda set to a cross-validated lambda.1se value. This is the lambda value that larger than 1 standard error of the minimum lambda value(2)

3. The regularized parameters for each box score stats are multiplied back to the Z-scores found in part A. The RSPM is moderately correlated with BPM at r= 0.698.

Thus, we have through 3 simple steps created a regularized version of box-score plus minus estimator that shrinks both extreme unexpected efficiencies and inefficiencies.

(1) The concept of bias is largely irrelevant outside of statistical textbooks since we cannot reliably assume that there is a true parameter we are estimating. For our purposes, the increase bias can be safely ignored. If have a bone to pick, go complain about RAPM/RPM first.

(2) Cost functions are estimated with error, lambda.1se errs on the side of parsimony while reasonably minimizing the cost function.

Leave some thoughts, don't bother if you want to bring up the garbage that is known as PIPM. Also, I would caution that this probably should be interpreted like RPM where you compare players in the same position, and try not to compare players across positions. Also the scale of RSPM is ~2.1 (centered around 0) units higher than BPM (mean of -2.1), so that's why you may see many ~5 RSPM players.
User avatar
CptCrunch
Analyst
Posts: 3,057
And1: 2,944
Joined: Jun 30, 2016
 

Re: Regularized Scaled Plus Minus (RSPM) - 2020-2021 Season 

Post#2 » by CptCrunch » Sun May 16, 2021 9:27 pm

5/16 update. No change to methodology

https://docs.google.com/spreadsheets/d/11bstSR6R3T1m21j-4VYrocLyRFJgbb0x1TxxPmEAXqk/

Top 10 comparison as of 5/16

Image
jambalaya
Sixth Man
Posts: 1,662
And1: 283
Joined: Feb 01, 2005

Re: Regularized Scaled Plus Minus (RSPM) - 2020-2021 Season 

Post#3 » by jambalaya » Wed Jun 30, 2021 8:41 pm

Positions based on... official NBA or what?
5 positions by default or consider less or more?

Return to Statistical Analysis