The lineup matchups from each game. Everytime a new player comes in, a new line in the matchup file is created. We have the data of 5 home players vs. 5 away players. The basic formula is:
Margin = HCA+a_1P_1+a_2P_2+a_3P_3+a_4P_4+a_5P_5-a_6P_6-a_7P_7-a_8P_8-a_9P_9-a_10P_10
where Margin is the scoring margin adjusted to 100 possessions, HCA is the intercept, a_1 to a_5 the coefficients for the home players, a_6 to a_10 the coefficients for the away players. Obviously, the index runs from 1 to ca. 450 (one index for each player). The regression finds the best fitting coefficients in order to approximate Margin. That boils down to matrix algebra. For the ridge regression a lambda is added and the coefficients are depending on the lambda now. For each lambda you get a different set of coefficients. It was shown that there is always a lambda for which RMSE (ridge) < RMSE (OLS). The best fitting lambda can be found via crossvalidation.
FWIW, I'm pretty dam impressed. I didn't think that kind of information was even available. I'm the guy who made the bad assumption.
It still doesn't match the eye test and the only hypothesis I have is regarding unit mixing and sample sizes, but I can't poke holes in the methodology as easily as I thought.












