Predictor of NBA regular season results

Moderator: Doctor MJ

Borut
Ballboy
Posts: 28
And1: 0
Joined: Jul 16, 2013

Predictor of NBA regular season results 

Post#1 » by Borut » Wed Dec 11, 2013 4:16 pm

Hello guys.
I am building a predictor of results in the regular season. I am a student of computer science.

So far I have created a basic predictor in Python programming language, which uses the naive Bayes algorithm. The results are not so great yet. The classification accuracy is around 65%, which is not very good since you can predict the winner of a game with around 61% accuracy just by always predicting the victory of the home team.

If anyone has any questions, ideas, experience please respond :D .
blabla
Sophomore
Posts: 156
And1: 76
Joined: May 23, 2012

Re: Predictor of NBA regular season results 

Post#2 » by blabla » Wed Dec 11, 2013 6:34 pm

I'd be glad to help you, but you have to get into more detail of what you have done so far. Are you using SRS? ELO? Any kind of regression technique, if yes, what type etc
Borut
Ballboy
Posts: 28
And1: 0
Joined: Jul 16, 2013

Re: Predictor of NBA regular season results 

Post#3 » by Borut » Wed Dec 11, 2013 11:12 pm

blabla wrote:I'd be glad to help you, but you have to get into more detail of what you have done so far. Are you using SRS? ELO? Any kind of regression technique, if yes, what type etc


I don't use regression. I have a classifier it predicts whether the home or away team will win, I get the probabilities of these two events using the https://en.wikipedia.org/wiki/Naive_Bayes_classifier, which I have programmed myself in Python. I don't use ELO ratings or SRS(don't even know it), but I'll check it out. I use a bunch attributes(rebounds, previous matches between the teams,average points scored ...) which I get from previous games. If you want you can download the file with my learning examples here https://drive.google.com/file/d/0B3Zl3fIv28_ZeWlBOGR6cEVmQWM/edit?usp=sharing

The naive Bayes is not good with spotting co-relation between certain attributes, so I am now working on finding certain co-relations between attributes, something like "a team with high pace and good defense wins a lot vs teams with low pace ...".
blabla
Sophomore
Posts: 156
And1: 76
Joined: May 23, 2012

Re: Predictor of NBA regular season results 

Post#4 » by blabla » Thu Dec 12, 2013 5:08 pm

You're classifier should be better than 65% when just feeding it with points scored and points given up (and removing rebounds etc.). You could probably compare your output with that of an official implementation of NBC, to see if there's any error in your code (maybe someone already wrote on in Python, Matlab or R?)

Is it handling homecourt advantage correctly?

It could also be that NBC is just not made for this kind of problem. I don't really know though
Borut
Ballboy
Posts: 28
And1: 0
Joined: Jul 16, 2013

Re: Predictor of NBA regular season results 

Post#5 » by Borut » Thu Dec 12, 2013 6:09 pm

I am also trying to learn how to program the algorithms myself, that is why I do the programming myself. I've used my data on some implementations of algorithms(NBC, decision trees) in the program Orange, the results were even worse if I remember, I'll try again tomorrow.

The home advantage is handled correctly, because if you look at the examples the class attribute has two values home victory(1) and away victory(0). NBC takes into account the apriori probability of the class attribute.

One the weaknesses of naive Bayes is that it doesn't recognize the co-relation between two attributes. For example if one team has 100 points per page scored and the other has 95, it won't recognize that the first is better in this attribute. So I' ll try a predictor where the two attributes like home_team_scored_average and away_team_scored_average will be just one attribute the difference between the two.

Why do you think that the result should be better? Are you familiar with any such predictors?
mysticbb
Banned User
Posts: 8,205
And1: 713
Joined: May 28, 2007
Contact:
   

Re: Predictor of NBA regular season results 

Post#6 » by mysticbb » Thu Dec 12, 2013 7:45 pm

Borut wrote:I don't use regression.


No idea, but that sounds like you have it backwards here. You want to predict something where you don't even know which variables should go in. Try using regression first, that will help you a great deal in order to create a predictor. Well, let me give you a hint: on the team level rebounds, assists, points scored, etc. are pretty meaningless, because it comes down to one specific variable and the HCA (homecourt advantage). If you set up the regression correctly, you will get the HCA by default and the equation to have a predictor better than yours (easily). Also, in my dataset (from 1977 to 2013) the head-to-head matchups are not significant.
blabla
Sophomore
Posts: 156
And1: 76
Joined: May 23, 2012

Re: Predictor of NBA regular season results 

Post#7 » by blabla » Thu Dec 12, 2013 11:04 pm

This site
http://www.usatoday.com/sports/nba/sagarin/
shows retrodictive results, and it says it's getting it right 76% of the time. The person who does those has a lot of experience with that kind of thing, so you don't have to be as good as him, but you should probably shoot for >70%. If you can barely beat the "hometeam wins" prediction it's not a good sign
Borut
Ballboy
Posts: 28
And1: 0
Joined: Jul 16, 2013

Re: Predictor of NBA regular season results 

Post#8 » by Borut » Tue Dec 17, 2013 3:39 pm

blabla wrote:This site
http://www.usatoday.com/sports/nba/sagarin/
shows retrodictive results, and it says it's getting it right 76% of the time. The person who does those has a lot of experience with that kind of thing, so you don't have to be as good as him, but you should probably shoot for >70%. If you can barely beat the "hometeam wins" prediction it's not a good sign


Of course I am shooting for a better accuracy, but there is no mistake in the code. I did a funny thing today. I measured which attributes predict results better. Here are the results

away_team_form home_team_form
correct 770 predicted 1209 Accuracy 0.636889991729
away_team_free_days home_team_free_days
correct 708 predicted 1209 Accuracy 0.585607940447
away_pace home_pace
correct 589 predicted 1209 Accuracy 0.487179487179
away_rebounds home_rebounds
correct 624 predicted 1209 Accuracy 0.516129032258
away_scored_avg home_scored_avg
correct 723 predicted 1209 Accuracy 0.598014888337
away_conceded_avg home_conceded_avg
correct 532 predicted 1209 Accuracy 0.440033085194


Unsurprisingly pace and rebounds are pretty irelevant. Team form is the best and days of rest is surprisingly good.
blabla
Sophomore
Posts: 156
And1: 76
Joined: May 23, 2012

Re: Predictor of NBA regular season results 

Post#9 » by blabla » Tue Dec 17, 2013 4:15 pm

Be careful with days of rest. I've found that 2 days is best, more is not better. It probably shows up as "good" in your analysis because teams aren't very good in back 2 backs
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Predictor of NBA regular season results 

Post#10 » by Chicago76 » Tue Dec 17, 2013 6:23 pm

The conceded pts category is counterintuitive. It's probably getting tangled up a bit with pts scored, which might be more dominant. Rather than bifurcating MOV into pts scored and conceded, have you given any thought to just using MOV? Maybe breaking it into avg away Margin of Victory/Loss and avg home Margin of Victory/Loss. I think this might clean things up a bit up.

Also, on the scheduling, as noted, 2 days rest is close to optimal. A day's rest can certainly be better than 4 days. The relationship isn't linear. The scheduling impact may need to be dealt with more specifically in a couple of ways:

1-days of rest (0-4), but also number of games over a 5 day stretch, this game included, ie, the game under prediction will be the nth game in the last 5 days for team A.
2-looking at days rest and n games in 5 days in relative terms (comparing both teams as well). Records for teams playing back-to-backs are poor, but they would be even worse if you could remove cases when both teams are playing a back-to-back (someone's gotta win) and deal with those separately.
Borut
Ballboy
Posts: 28
And1: 0
Joined: Jul 16, 2013

Re: Predictor of NBA regular season results 

Post#11 » by Borut » Wed Dec 18, 2013 1:16 pm

Chicago76 wrote:The conceded pts category is counterintuitive. It's probably getting tangled up a bit with pts scored, which might be more dominant. Rather than bifurcating MOV into pts scored and conceded, have you given any thought to just using MOV? Maybe breaking it into avg away Margin of Victory/Loss and avg home Margin of Victory/Loss. I think this might clean things up a bit up.


Conceded points give an accuracy of 0.55997, if you were confused by my results. I might try just using MOV alone in the future.

Chicago76 wrote:Also, on the scheduling, as noted, 2 days rest is close to optimal. A day's rest can certainly be better than 4 days. The relationship isn't linear. The scheduling impact may need to be dealt with more specifically in a couple of ways:

1-days of rest (0-4), but also number of games over a 5 day stretch, this game included, ie, the game under prediction will be the nth game in the last 5 days for team A.
2-looking at days rest and n games in 5 days in relative terms (comparing both teams as well). Records for teams playing back-to-backs are poor, but they would be even worse if you could remove cases when both teams are playing a back-to-back (someone's gotta win) and deal with those separately.


Scheduling seems to be pretty important, so I will research in further detail in the future. I changed my data like I said I would(the attributes are now the difference between home and away team), the results with such data are actually even a little worse. I have managed to improve my accuracy to 68% by selecting the best attributes by forward selection, which is adding the best remaining attribute to attribute selection until the accuracy starts decreasing.

I have a question which are best ratings for Nba teams, you know something like http://espn.go.com/nba/powerrankings I will measure their accuracies.
Chicago76
Rookie
Posts: 1,134
And1: 228
Joined: Jan 08, 2006

Re: Predictor of NBA regular season results 

Post#12 » by Chicago76 » Wed Dec 18, 2013 6:28 pm

Borut wrote:
Chicago76 wrote:I have a question which are best ratings for Nba teams, you know something like http://espn.go.com/nba/powerrankings I will measure their accuracies.


I wouldn't use power ratings. Things I would probably look at:

1) http://www.usatoday.com/sports/nba/sagarin/2013/rating/
2) SRS, which can be found at b-r. I think I saw somewhere in this thread you mentioning you weren't aware of SRS. SRS is "simple rating system". It takes average point differential (MOV) + strength of schedule, which is measured as average opponent MOV, and it sums them together. There is bound to be a blog post somewhere that gives home court advantage, expressed as pts added to SRS to serve as game predictors.
3) Vegas betting lines. There should be places to find the old closing lines prior to NBA games and quite possibly some place where that information is free.

In terms of predictive ability, I would assume Vegas Lines > Sagarin > SRS.

Sagarin should be superior to SRS because the model is iterative, so it looks not only at your direct opponent MOV, but a team's opponent's opponents, and their opponents, etc until equilibrium is hit.

Vegas should be superior to Sagarin because they effectively use Sagarin and then adjust lines up or down for information the model does not know: injuries, recent slumps, recent locker room discord, etc. It can at least subjectively weigh evidence that Sagarin does not know.

A reasonable goal for you would be to devise something that comes out somewhere between SRS and Sagarin in terms of predictive power.
blabla
Sophomore
Posts: 156
And1: 76
Joined: May 23, 2012

Re: Predictor of NBA regular season results 

Post#13 » by blabla » Thu Dec 19, 2013 12:43 am

SRS also looks at opponents opponents (etc.) MOV. Not saying SRS and Sagarin are the same, but that's not the difference

Return to Statistical Analysis