Non-linear Stats (weighting)

fluffernutter · Post #1 » by **fluffernutter** » Thu Nov 25, 2010 12:24 am

I am new to all this so if this is being done, boring, stupid, whatever, let me know.

But, I was thinking, one big problem with box-score accumulation stats is that the simple ones are all weighted the same.

This is an odd way to do things.

For example, let's say that averaged over 82 games:

Player A gets 10 defensive rebounds, and 1 steal, and blocks 2 shots.

Player B gets 8 defensive rebounds, and .5 steals, and blocks 4 shots.

You might cook up some stat (call it STD or "stupid total defense") which is DefReb + Steals + 2*Blocked Shots. Whatever. Some stat which uses these values.

Player A would have a STD of 15.
Player B would have a STD of 16.5.

Player B is "better" according to the stat, but "worse" than Player C who manages 12 defensive rebounds, and 2 steals, and blocks 2 shots, for a STD of 18.

Now obviously you see the problem. There are certain values that when they rise, their effect upon the game is disproportionately huge in a non-linear way. A guy who blocks 1 time a game; OK. A guy that blocks 2 a game; OK. A guy that blocks 3 a game; this is suddenly indicating a high level of blocking skill. A guy that blocks 4 a game; a total stud, and is probably effecting many shots he is not directly blocking. A guy that blocks 5 a game is a Mark Eaton-type, a beast. Blocking 5 a game should be worth a lot more than twice as much as the guy blocking 2.5 a game. It's obscene.

So in theory you might have a table or some formula (more elegant) which you could call "block effect."

BE (1) is 1 (block 1 a game, the BE value is 1).
BE (2) is 2.
BE (2.5) is 3.
BE (3 is 4.5)
BE (4) is 6.
BE (5) is 8 or something.

Roughly speaking, blocking 5 is game is about 4 times as valuable as a guy blocking 2. Whatever. The idea is that there is variation. Blocking 5 is not just 2.5 as good as blocking 2. It's worth... way way more.

Where does the BE formula come from? Good question. Not sure. But even a slight adjustment will help a lot, as far as common sense goes.

Using this sort of scheme for BE (replacing the block value in the above formula) gives an STD for the blocking stud of 20.2 - significantly better than the other two, which is as it should be, given elite shot blocker's historical dominance. But this difference is lost if you just have a straight multiplier.

You can easily see how this would also have value for rebounds. If you average 10 rebounds a game as a big man, that's fine. Very respectable, solid, etc. You are doing what you should. It seems to me, however, that as you start going up the rebound scale, something strange happens. If you rebound 12 per game, that's excellent. If you rebound 14 a game, you are a stud. And think about what this means. Where are these extra rebounds coming from? You are likely stealing one from a teammate, who knows you are a rebound lover, but that can't explain all of it. Most of these extra rebounds have to come from rebounds that otherwise would have gone to the other team (you see this fairly often with Dwight H). In other words, the higher you go, the move valuable these extra rebounds become, since they are straight possessions. A freak like Rodman who pulls down 18 a game for 81 games is probably giving your team an additional 4-5 possessions a night. And that's huge, it's about 4-5 extra points per night!

That's very different from the center who pulls down 5 vs. the center who pulls down 10. The weak-rebounding center likely has somebody else on his own team who gets most of his obvious misses; the 10 per center probably gets all rebounds coming his way plus 1 extra effort rebound per game. Not the same as the difference between, say, 10 and 15...

You can easily do the same with assists, which as you get above 8 get increasingly valuable, and when you hit 12 or something, it should matter more: averaging 13 a game vs. 10, for example, is a LOT different than averaging 8 vs. 5. They are both studs (10 and 13), but one is a superstar, and the other is otherworldly, a Stockton/MJ type. You simply can't average 13 assists per game unless you are one of the top PG's in history. So you should obviously weight the top end differently, and value the high numbers more.

So yeah. This is probably being done already, like, all over, right?

DSMok1 · Post #2 » by **DSMok1** » Fri Nov 26, 2010 5:06 pm

Most composite advanced stats are linear, if I remember correctly. However, Advanced Statistical Plus/Minus is nonlinear on some stats, but not others. It was created as a regression of advanced stats onto adjusted plus/minus...

The problem with going non-linear is that most non-linear models break down at the extremes--in real life, 1 block is the same as another. What may be different is that a player with 4 blocks may also intimidate and alter shots more, which isn't captured by the linear model.

On the ASPM model, I experimented with nonlinear (cubic) rebounds, but ended up not getting better results than with a logarithmic model--basically, there is a big penalty for players that don't get any rebounds, but Dennis Rodman doesn't get a big boost for getting a ton--it behaves linearly at near and above average rebounding percentages.