ESPN getting on the RAPM bandwagon "Real Plus Minus"

mysticbb · Post #21 » by **mysticbb** » Tue Apr 8, 2014 2:34 pm

GSP wrote:mystic what level do u have other players this season in the top 10?

Ok, just ran the numbers for this season with all corrections and adjustments. Here is the result:

LeBron James         7.1
Chris Paul           6.7
Kevin Durant         6.2
Kevin Love           5.6
Andre Iguodala       4.8
Dirk Nowitzki        4.7
LaMarcus Aldridge    4.7
Stephen Curry        4.6
James Harden         4.6
Blake Griffin        4.6

mysticbb · Post #22 » by **mysticbb** » Tue Apr 8, 2014 2:57 pm

GC Pantalones wrote:Correct me if I'm wrong but let's say I was a +3 player last season and this season I played like a +1 player due to my old age making me less effective as a player. Without the age adjustment wouldn't my RPM still be well under the +3 I was past year effectively lowering my RPM because I aged (which made me play worse). I kinda understand that it helps the numbers become more accurate for predictions but I think it screws up older guys because players like Duncan, Dirk, Pierce, and Carter haven't really seen a decline in years.

So, you are correct that in such instance the value would be lower, but it would come closer to the "real" value by adjusting the prior downwards. Without that adjustment the values were constantly driven towards older players, who where better in the past, essentially getting their values increased by the development of younger players, just based on the fact that the prior had the older player higher than the younger player. To correct for that, an aging curve is applied.

And yes, for some older players that might be not correct (Nowitzki plays actually better this season than in 2013 for example, where the previous year was negatively effected by playing earlier in the rehabilitation process and therefore dragging his numbers down), but the focus here is set on the average, not the specific outliers.

I agree with that kind of adjustment, because I see the same "value shifted towards older players" in the regression when using a prior from the previous season albeit such a regression gives me better results overall than without the prior.

When the season progresses I can switch to no-prior informed RAPM and get basically the same quality in terms of prediction, but I probably get saved by the fact that my SPM is better at predicting than J.E.'s.

GC Pantalones wrote:I honestly don't understand what he was trying to accomplish by creating xRAPM.

Getting a better predictive value. Meaning, when you use the available data to predict the outcome of future games, you get a better result based on that xRAPM than other versions like no-prior informed RAPM (called vanilla or NPI) or RAPM informed RAPM values (RiRAPM) with different years used for the prior or a combination of that while using multi-year data as the basis instead of running the regression only on the current season pbp-data. And he achieved that.

Note: You can achieve a similar predictive power by using a different approach in terms of the prior or as I using a merged metric based on SPM and no-prior informed RAPM where the merger is achieved by OLS on the results.

GC Pantalones wrote:Correct me if I'm wrong but it seems like RAPM (the pure version) takes a lot less of an influence from prior than xRAPM does.

Depending on the version. Vanilla or NPI or no-prior informed aren't taking any kind of information from the previous season(s) into account, if they are run at the current season pbp-data. But some of the RAPM versions out there are using a multiyear approach (if I understand that correctly, J.E. has on his website "pure RAPM", which is based on a multi-year sample, where the most recent are weighted more than before) or using RAPM values from previous season as a prior (the IPV guys have put that out here: http://www.gotbuckets.com/statistics/rapm/2014-rapm/ (or at least they provided the data; it should be one of the best out there, because I have an idea how clean their matchupfile is and that they know their stuff about regression!)

The issue is that it becomes an exchange of reliability vs. validity, and the motivation for the most people doing those kind of things is rather getting a better prediction and eliminate effects like multi-collinearity or overfitting. A bigger sample might achieve that as well as a prior based on different things (e.g. previous season data or boxscore or whatever).

GC Pantalones wrote:Also I don't think ESPN will be using these numbers for prediction rather than analysis. Sure using the prior raises future accuracy but when trying to gauge how someone played already (like saying Lebron has been having the best season based on his RPM) I think this leaves a bit to be desired.

I agree. But I also don't know how ESPN will use that exactly. I know that my POV in regard to the interpretation differs from Engelmann's, but also from yours or DocMJ's. Well, I'm more in-between those two views, where Engelmann thinks that this value would also present the best value in terms of saying how good a player was in a respective season, while you or DocMJ are seeing the influence of previous seasons as too much in order to get a really useful statement about the current out of this. Not quite sure, but from my perspective each view is too rigid; the one acts as if no influence is really there, the other ignores somewhat that the prior is giving a more stable view on the player and is eliminating some of the variance included in each season based on mere coincedence rather than real player performance level.

Dr Positivity · Post #23 » by **Dr Positivity** » Tue Apr 8, 2014 3:16 pm

Honestly my main complaint with xRAPM or RPM is the label. I don't think it should be called a +/- stat once the box is added. Use a WARP like name instead and it'd sit easier with me

PCProductions · Post #24 » by **PCProductions** » Tue Apr 8, 2014 3:28 pm

Thanks for the insight, mystic. Really interesting stuff.

SideshowBob · Post #25 » by **SideshowBob** » Tue Apr 8, 2014 4:38 pm

Saw this yesterday, and its definitely a step in the right direction.

Good stuff mystic.

Post #26 » by **Doctor MJ** » Wed Apr 9, 2014 3:07 am

mysticbb wrote:
Doctor MJ wrote:Let alone freaking aging curves!

What do you think is wrong with adjusting the prior up or down based on the expectated development of a player? A rookie is expected to get better while a 30+ player is expected to get worse. Adjusting that seems not just a gimmick to me, but adapting a method to something which is real.

Doctor MJ wrote:This was what I didn't like about how I knew people would use Engelmann's stats when he went in this direction, and now it's mainstream and the usage is not only happening but is being championed...

I know you don't like it, but how about the fact that the result gives a better prediction for upcoming games in this season? Should we use the +4 James (that's what he comes out in NPI RAPM for me) instead of the +8 James (well, I have him at +7, but still the best in the league ahead of Durant and I only use current season information). I know that it would be important to only judge the current season with information from this season, but the outliers here aren't that many anyway. For me, the bigger issue is the choosen prior based on boxscore stats, which isn't mentioned on ESPN and has some really weird results. Also, using a boxscore based prior for defense is an issue, because the boxscore is really not good at determining defensive impact. It gives more stability to the results, but someone like Boozer isn't becoming a "good" defender just based on getting those defensive rebounds.
Nonetheless, I think that is a step in the right direction overall.

I'm pretty sympathetic to the use of prior when trying to simply have the best predictions for what comes next.

Additionally, you might have seen I compiled a spreadsheet of PI RAPM over the years recently. When looking back on a player's career I prefer the stabilty of the PI model to the NPI if I'm force to choose one.

My biggest issue with the move toward xRAPM isn't that it exists, but the fact that he seems to see it as something that replaces RAPM because it factors more things in in one metric, when I consider it a fundamentally wiser approach often times to keep metrics separate and combine them into a whole with your own brain.

My subsequent issues with the approach, is in the misinformation that gets spread. Engelmann was horribly unprofessional with the labeling on his site and that caused problems. Now we have things on ESPN, and they are specifically doing what they shouldn't do.

I'm find with someone choosing a prior-informed approach as their best estimate for what LeBron is.

I am so NOT fine with someone attempting to rebut a claim that LeBron isn't having the same impact he had last year by using a metric biased by last year's data when the unbiased versions of that metric literally contradict the rebuttal. Whether LeBron actually is having less impact this year than in previous years is something that can be debated and disagreed with, but there's just a fundamental contradiction in how the data is being used right in the inaugural manifesto and there's every reason that's going to continue.

And yeah, obviously since xRAPM is where I really had my hissy fit, it's the box score prior that drove me over the edge. I don't mind that such a stat exists - I had already been curious what such a stat would say before Engelmann announced what he was doing - but when you (Engelmann, not you just to be clear) completely replace the flagship stat of a family of stats salient for their validity with something with such a clear sacrifice of validity and show signs you don't see what the issue is, that's the moment I realize you don't actually know what you're doing.

In the end I see Engelmann and those who don't see a problem with Engelmann in the statistical community as people who don't really understand how an expert analyst works to combine metrics in his head. To go back to the buzzword of the '80s, they are trying to build an "expert system" which will replace the expert, and of course the reason why people don't talk about expert systems any more is because they failed over and over and over again in most areas to succeed. I see folks like Engelmann as basically the sort who failed to learn the lessons of the past.

And incidentally, I see 538 doing the same types of things right now, and I saw Gladwell & Berri do it before - heck I saw Winston do it when he basically invented advanced +/- analysis.

I want to be clear though, I don't hold everyone in the basketball stat culture in this view, and I don't hold you in it. Simply because one is trying to make a stat that is more capable of being used independently of an expert doesn't mean you're doing anything wrong. The issue only becomes one when someone doesn't understand the strengths and weaknesses of all the different approaches around and simply assumes that they are too smart to be missing the point.

Dr Positivity · Post #27 » by **Dr Positivity** » Wed Apr 9, 2014 5:53 am

I find it frustrating how hard it is to find "honest" APM, meaning "+/- adjusted by everyone else's" and that's IT, let the results be what they may. Of course the reason people pushed past that is the more clean APM likely had crazy looking or non predictive results. So the response has been to try and push the ball past the goal line to predictive soundness with progressively sketchier fudging. And this was true before RPM or xRAPM. Tbh I never really trusted APM after reading this http://arturogalletti.wordpress.com/201 ... g-a-model/ And as importantly, that no APBR person in the comment section found a way to debunk the claims or deny that Step 3 is happening and destroys the stat

Post #28 » by **bondom34** » Wed Apr 9, 2014 6:09 am

This is one of the most interesting threads I've read in a while, and I think I'm somewhere with Doc and/or GC on how I feel about the PI stuff with RAPM. My question is with xRAPM, what is the box score metric?

mysticbb · Post #29 » by **mysticbb** » Wed Apr 9, 2014 7:21 am

Doctor MJ wrote: ...

I understand where you coming from, but you should be aware that Engelmann can do whatever he likes with that. If you want vanilla RAPM or prior-informed, you are free to calculate that by yourself. I understand that it is frustrating to see a source of very good RAPM numbers without such boxscore influence go, but overall Engelmann's motivation was to get a better predictive metric; that's was his goal in all those years, even when he just had prior informed RAPM on his website.

mysticbb · Post #30 » by **mysticbb** » Wed Apr 9, 2014 7:26 am

Dr Positivity wrote:Tbh I never really trusted APM after reading this http://arturogalletti.wordpress.com/201 ... g-a-model/ And as importantly, that no APBR person in the comment section found a way to debunk the claims or deny that Step 3 is happening and destroys the stat

You should forget about that, because Galletti was confusing Rosenbaum's statistical +/- model and his approach of merging an APM derivate and his SPM model with normal APM calculation. Also, Galletti has no clue what a ridge regression is and what advantages such technique presents. Really, step 3 is not happening; and when you read the comment section you will find a post by Aaron (who is Aaron Barzilai, currently the Director of Basketball analysis on the 76ers and the guy running basketballvalue.com), in which he explicitely says that step 3 does not happen in his calculation. So, please, don't try to use personal incredulity as an argument, because it will always be a logical fallacy.

mysticbb · Post #31 » by **mysticbb** » Wed Apr 9, 2014 7:29 am

bondom34 wrote:My question is with xRAPM, what is the box score metric?

https://stats-for-the-nba.appspot.com/ratings/ASPM.html

Those are the values used in the prior. Part of the prior is the ASPM, the other part is previous season xRAPM. The prior gets adjusted and then regressed to the mean before it is applied. Also, the prior is basically the starting value for the regression; "giving the algorithm some sort of sense" in which range the results should be. So, don't put too much into it. The main influence is still the current season pbp data.

Post #32 » by **Doctor MJ** » Wed Apr 9, 2014 7:43 am

Dr Positivity wrote:I find it frustrating how hard it is to find "honest" APM, meaning "+/- adjusted by everyone else's" and that's IT, let the results be what they may. Of course the reason people pushed past that is the more clean APM likely had crazy looking or non predictive results. So the response has been to try and push the ball past the goal line to predictive soundness with progressively sketchier fudging. And this was true before RPM or xRAPM. Tbh I never really trusted APM after reading this http://arturogalletti.wordpress.com/201 ... g-a-model/ And as importantly, that no APBR person in the comment section found a way to debunk the claims or deny that Step 3 is happening and destroys the stat

So you know Got Buckets now delivers an APM (and mystic says good things about their source):

http://www.gotbuckets.com/statistics/apm/2014-apm/

I"m with you that it's been awful frustrating seeing data disappear. Hence my hissy fits. :evil:

Re: Arturo & APM. Dude has some serious misunderstanding about the stat. I'll say a bit here. Feel free to ask more:

When you refer to Step 3, I assume you mean this:

Step 3 :Calculating Adjusted +/-

The final step is to take the Pure regression and the Stats model and adds them up by player like so:
APM = x* Pure +/- + (1-x)*Statistical +/-

And proceed to adjust x between 10% and 90% for each player to minimize the error. In essence he tweaks the rating to get a high R-Square.

To summarize, the APM model calculates two variables with a low correlation to wins (R^2 <5%) and adds them up to minimize the error and guarantee a 90%+ Rsq. for the overall model.

Funny that.

The biggest issue with this is that he's writing in 2011 using a source from 2004 as gospel, when in fact no one sense that source did the thing that Arturo is objecting to - and we were years into the RAPM era by 2011 anyway. I'm not going to sort through the comments on the page years after the fact, but I can tell you that people marveled at how either clueless or how disingenuous he was. Why?

Because there was a history between the Wages of Wins (WoW) people and APBRmetrics people where APBRMetrics people basically said "APBRmetrics is where basketball stat people talk about basketball stats online, come talk with us over there", and the WoW people didn't. The APBRmetric people kept trying to facilitate the discussion by commenting on the WoW boards, and many got their comments deleted and their accounts banned. I witnessed it personally, and their behavior in no way warranted such behavior - and I say this speaking as a long term mod of the strictest basketball board around.

So then after all that, Arturo decides to look into this "enemy" stat, and he completely wastes his time precisely because he chose to try to figure it out on his own, taking every opportunity to just criticize rather than ask experts.

As for why the 2004 source (Roesenbaum) did what he did: The concern with +/- is always the noise. Results are not as consistent with this stat as they are with box score metrics. Rosenbaum as a guy just jumping in and playing around had a couple ideas in his head (APM & SPM), and just decided to do both, and then combine them undoubtedly thinking that would give maybe some more reliable data. There's nothing wrong with any of this really, but that was basically it for Rosenbaum. He never published any more results, and it wasn't until other guys (Ilardi, Sill, etc) got involved that any of us started seriously using APM data.

So, what did it take for people to get "sold" on it? Speaking just for myself:

1) I had to figure out what the quirks were. If you just take a one-year APM value at face value for every player in the league you're going to have some crazy ideas in your head, so you would never do that. That doesn't mean though that things are purely random or that all the results are equally slippery.

Consider 2006 where the Pistons had extreme success playing their starting lineup in huge minutes. This causes a massive dose of the what's called multicollinearity and it makes it so the APM data for those guys taken out of context isn't necessarily useful. But why would paint the issues we see with the Pistons with the same brush as a team where you had a more varied distribution of minutes? You wouldn't, if you knew what you were doing.

When points like this get made in rebuttal to guys like Arturo, or to non-stats guys, people tend to make a two-pronged criticism that has always seemed contradictory to me: 1) You're using APM for everything!... 2) except when you just ignore it. Whenever I see a person use a hammer to hammer a nail and not use a hammer to brush his teeth, that tends to make me think that the person knows what he's doing, but in these circumstances explanations tend to get dismissed as excuses.

2) People started doing stuff to minimize the noise. They did multi-year model, they did collections of one year models, they created "regularized" APM, they varied the timespan for that too, then they started making regression models on things like rebounding, turnovers, etc too. With every step along the way we've just gotten better and better data and better and better at using the day.

Now, folks like Arturo will stay cling to the fact that there's more noise in RAPM than in something like WP, but it's crucial to understand precisely what that means. It means that they are criticizing a model for lacking in reliability - which is fine - while giving no clear argument that the alternative they champion offer similar validity.

Fundamentally, WP and RAPM literally do not measure the same ting. WP like any box score metric is biased toward particular skills and foci associated with particular stats, and typically that means a preference for players on the offensive side of the ball. This means your ideal WP by definition is not your ideal basketball player. This is indisputable despite the fact that Berri and his guys have often tried to dispute it. The box score doesn't track everything, and so you can only go so far with it. The bias of the box score makes removal of that bias impossible.

So then, exactly how valuable is it to be an extremely reliable - extremely consistent - stat that gives a biased representation of what it purports to measure? Answer is clearly "depends". It can still be quite useful, but the reliability isn't even necessarily a good thing.

Here's the image I often use to explain the importance of +/-:

A stat like Wins Produced is like the results on the left. A stat like +/- is one like the middle. Which is better? What does that even mean being "better"? Clearly neither is the result on the right, so you can't just use one or the other. You've got to use both. (Disclosure: Of course we don't actually use WP typically, we use other stats that do what WP does better, but if those other stats didn't exist, I'd use WP.)

Post #33 » by **Doctor MJ** » Wed Apr 9, 2014 7:50 am

mysticbb wrote:
Doctor MJ wrote: ...

I understand where you coming from, but you should be aware that Engelmann can do whatever he likes with that. If you want vanilla RAPM or prior-informed, you are free to calculate that by yourself. I understand that it is frustrating to see a source of very good RAPM numbers without such boxscore influence go, but overall Engelmann's motivation was to get a better predictive metric; that's was his goal in all those years, even when he just had prior informed RAPM on his website.

lol, well sure he can do whatever he wants, and since he's providing stats without getting money from me, what right do I have to complain?

But I suppose, I don't really see what I'm doing as complaining. I may seem whiny, but I'm not whining to someone I think can make Engelmann "stop it", I'm simply telling the world about the issues with the data that's in front of them and I'm not doing it with a bullhorn. People make threads, people ask questions, I speak to them. I'll admit I do let me frustration be known as I do so, but that's typically what comes when the same misconceptions keep popping up over and over again and you want to fix them.

mysticbb · Post #34 » by **mysticbb** » Wed Apr 9, 2014 7:59 am

Doctor MJ wrote:I'll admit I do let me frustration be known as I do so, but that's typically what comes when the same misconceptions keep popping up over and over again and you want to fix them.

I completely understand that, but when I read your posts about it, I get the impression that you blame Engelmann for doing something "wrong", when the motivation for him and for you are different. That's all I wanted to say.

Anyway, have you seen this?

http://www.gotbuckets.com/statistics/rapm/2012-rapm-2/

You can find RAPM informed RAPM values there (so, what we knew as prior informed RAPM). Those are the values coming from these guys here (http://talkingpracticeblog.com/), I know they have a clean matchupfile for those years and know the math part very well. Those are going from 2008 to 2014. Something better for your taste? (Btw, I linked the 2012 version, because in the menu the link is broken.)

Edit: Damn, I just saw your lengthy post before, which makes my last paragraph basically obsolete. But I have to add that I don't know their source of APM (at that, where they get the matchupfile from, or how they are creating it); they use Eli Witus' description how to create APM, which should be fine. So, I only can comment on the RAPM part specifically, because I talked with those guys via APBR PMs.

Knosh · Post #35 » by **Knosh** » Wed Apr 9, 2014 9:54 am

Hey mystic,

can you talk a bit about those matchup files? I was looking around the web a few days ago, trying to toy around with APM a bit and couldn't really find much. Seems like there aren't really any publicly available sources.
ESPN says they use data from bkref for RPM, but I couldn't find anything on bkref.

So do you know a source for matchup files? If there isn't some official source like nba.com how exactly are they created?

mysticbb · Post #36 » by **mysticbb** » Wed Apr 9, 2014 11:53 am

Knosh wrote:So do you know a source for matchup files? If there isn't some official source like nba.com how exactly are they created?

Only source I know of: http://basketballvalue.com/downloads.php

But those end in 2012. Otherwise people are building matchupfiles for themselves, but are very reluctant to give them away for free, because it is a lot of work cleaning it up.

bbr has the pbp data, J.E. is using those, because the pbp is clean and contains links to the player pages, which makes it easier to convert it into matchupfiles (not the pages itself, but each player gets a unique name). That's what the reference means here. bbr has no matchupfiles available.

Dr Positivity · Post #37 » by **Dr Positivity** » Wed Apr 9, 2014 2:27 pm

mysticbb wrote:
Dr Positivity wrote:Tbh I never really trusted APM after reading this http://arturogalletti.wordpress.com/201 ... g-a-model/ And as importantly, that no APBR person in the comment section found a way to debunk the claims or deny that Step 3 is happening and destroys the stat

You should forget about that, because Galletti was confusing Rosenbaum's statistical +/- model and his approach of merging an APM derivate and his SPM model with normal APM calculation. Also, Galletti has no clue what a ridge regression is and what advantages such technique presents. Really, step 3 is not happening; and when you read the comment section you will find a post by Aaron (who is Aaron Barzilai, currently the Director of Basketball analysis on the 76ers and the guy running basketballvalue.com), in which he explicitely says that step 3 does not happen in his calculation. So, please, don't try to use personal incredulity as an argument, because it will always be a logical fallacy.

Thanks for the response (I'll let you off the hook for the last sentence)

In http://www.82games.com/comm30.htm the "Step 3" combined regular APM and SPM is how Rosenbaum calculated his "Overall rating". So yes this is problematic, when someone posts the label Overall Rating on a site like that (especially with 82games simple layout in 04), that's the number everyone's eyes will jump to first as the important, all in one number. It is not as if he did this mistake for an experimental SPM he considered far beneath his real APM work, it was for what would be read as the big number

Now the obvious response to this is who cares about what a guy who's been replaced in prominence did in 2004, but my point was more that if the closest thing to a APM pioneer is making a mistake like THAT, it puts a dent in his credibility. Which becomes more concerning when realizing Engelmann as well has a severely dented credibility from recent behavior (edit: IMO after things like putting height in xRAPM). And nobody else is really transparent enough to trust. So from a credibility perspective APM community has not gotten off to a good first decade, in my opinion. It doesn't mean a clean, transparent version of APM that "lives with" the Richie Frahm (an example in the 82games article) outlier results instead of running from them can't happen, but so far, there has been a little too much tinkering for my tastes. Or to put it another way, nothing about the current state of APBR/APM tells me that if someone posted an APM online with sketchiness like Rosenbaum's Overall Rating from 04, that we would actually notice enough to throw it out. So long as they hide the shady parts of the calculation. But perhaps I am judging that wrongly.

mysticbb · Post #38 » by **mysticbb** » Wed Apr 9, 2014 2:55 pm

Dr Positivity wrote:In http://www.82games.com/comm30.htm the "Step 3" combined regular APM and SPM is how Rosenbaum calculated his "Overall rating". So yes this is problematic, when someone posts the label Overall Rating on a site like that (especially with 82games simple layout in 04), that's the number everyone's eyes will jump to first as the important, all in one number. It is not as if he did this mistake for an experimental SPM he considered far beneath his real APM work, it was for what would be read as the big number

You may be able to follow his train of thought, but he clearly did not understood the method at hand. He couldn't devide between APM and the SPM as well as the combined rating despite the fact that it is clearly noted. His lack of knowledge about the underlying math as well as the used metric is the reason here, not the inability by Rosenbaum to declare the numbers properly.

Dr Positivity wrote:Now the obvious response to this is who cares about what a guy who's been replaced in prominence did in 2004, but my point was more that if the closest thing to a APM pioneer is making a mistake like THAT, it puts a dent in his credibility.

What? Rosenbaum did not make a mistake, he simply used his best knowlegde (which may be lower than the knowledge of others) to create a better overall metric than APM alone is. That's all he did. That doesn't change the underlying math of OLS or Ridge Regression at all. Really, your argumentation is dishonest, to say the least.

Dr Positivity wrote:Which becomes more concerning when realizing Engelmann as well has a severely dented credibility from recent behavior. And nobody else is really transparent enough to trust.

Again, arguments based on personal incredulity is a fallacy. Only because you can't follow the math, doesn't mean that the other people can't do that either. You are putting your lack of "trust" into an article written by someone without the needed knowledge, while then want to conclude something about other people. That is not a honest argumentation. Please, don't take offense, but a Dunning-Kruger effect seems to be at work, because you clearly don't have the knowledge about the math and metric itself, you really should not try to discredit things made by people with a far greater knowledge about it than yourself.

Dr Positivity wrote:APBR people taking an APM results that "doesn't look right" compared to their confirmation bias and their response being to tinker with it until it does fit but usually not show us how, is something I would consider, at the least, sketchy. Or to put it another way, nothing about the current state of APBR/APM tells me that if someone posted an APM online with sketchiness like Rosenbaum's Overall Rating from 04, that we would actually notice enough to throw it out. So long as they hide the shady parts of the calculation.

Wow, that is such nonsense, it is incredible. Seriously, the reason for the "tinkering" are not outlier results and confirmation biases, but the motivation behind it is to produce a lower error in terms of prediciting and explaining. The real reason for that is an improved overall metric, not some sort of "adjustment to fit the personal preferences". Please, read DocMJ's post regarding reliability vs. validity to understand what is going on, and try to refrain from statements like that, where it is clear that you lack of understanding the underlying method is driving your fantasy about what the motivation of people doing the analysis would be.

And what would be the "shady part" of this:

Code: Select all

bias(β) = -λUβ

where U = (X^(T)X + λI)^(-1)

and β = (X^(T)X + λI)^(-1)X^(T)y

β is the coefficent vector (the result)
X is the design matrix
X^T is the transpose design matrix
y is the response vector

Knosh · Post #39 » by **Knosh** » Wed Apr 9, 2014 3:43 pm

mysticbb wrote:
Knosh wrote:So do you know a source for matchup files? If there isn't some official source like nba.com how exactly are they created?

Only source I know of: http://basketballvalue.com/downloads.php

But those end in 2012. Otherwise people are building matchupfiles for themselves, but are very reluctant to give them away for free, because it is a lot of work cleaning it up.

bbr has the pbp data, J.E. is using those, because the pbp is clean and contains links to the player pages, which makes it easier to convert it into matchupfiles (not the pages itself, but each player gets a unique name). That's what the reference means here. bbr has no matchupfiles available.

Thanks! Guess those older files will do for my purposes for now.

mysticbb · Post #40 » by **mysticbb** » Wed Apr 9, 2014 3:46 pm

Knosh wrote:Thanks! Guess those older files will do for my purposes for now.

Not quite sure whether you need it, but here is a rather good intro into the topic (albeit is just APM): http://www.countthebasket.com/blog/2008 ... lus-minus/