Does anybody reccomend a some already made webscrapper for NBA stats. Right now I am just downloading data manually from basketball reference. I am using Python
I will try some of the scrapper from Github.
Webscrapper for NBA stats?
Moderator: Doctor MJ
Webscrapper for NBA stats?
-
- Ballboy
- Posts: 10
- And1: 0
- Joined: Jul 09, 2018
Re: Webscrapper for NBA stats?
-
- Junior
- Posts: 391
- And1: 344
- Joined: Jan 21, 2018
Re: Webscrapper for NBA stats?
Short Answer:
As far as I know, there are no currently available webscraper on GitHub. However, I am in the process of buidling one now and will get it to GitHub when I have it working (currently parsing play by play data from any and every game) and once I have it working and tested, I will put the link here or in a new thread.
How to Build Your Own:
Here is the process if you don't want to wait for me to finish or don't trust my program and the steps to build your own webscraper.
In layman terms, webscrapers for NBA stats (nba.com or basketballreference.com) are a tricky thing as all the data is generated using javascript. Which means you just can't right click a web page, click view page source, and see all the data there.
So when you use the requests library to call a website (https://www.pythonforbeginners.com/requests/using-requests-in-python) in your python program, you will only get the javascript code, not the actual data that you're trying to scrape. So that sucks.
So how do you get the actual data that you want. You need a program that can run the javascript code and take the newly generated data. But how do you do that? There is a library called Selenium which can do that for you (http://stanford.edu/~mgorkove/cgi-bin/rpython_tutorials/Scraping_a_Webpage_Rendered_by_Javascript_Using_Python.php).
After you get all the html code, it's just a matter of taking all the html code and creating a python script to scrape and store the data in a structure that works for your convenience. This is by far the trickiest and most consuming time part. A lot of debugging needed, and the html code you get will be very messy. But this is more or less the process.
As far as I know, there are no currently available webscraper on GitHub. However, I am in the process of buidling one now and will get it to GitHub when I have it working (currently parsing play by play data from any and every game) and once I have it working and tested, I will put the link here or in a new thread.
How to Build Your Own:
Here is the process if you don't want to wait for me to finish or don't trust my program and the steps to build your own webscraper.
In layman terms, webscrapers for NBA stats (nba.com or basketballreference.com) are a tricky thing as all the data is generated using javascript. Which means you just can't right click a web page, click view page source, and see all the data there.
So when you use the requests library to call a website (https://www.pythonforbeginners.com/requests/using-requests-in-python) in your python program, you will only get the javascript code, not the actual data that you're trying to scrape. So that sucks.
So how do you get the actual data that you want. You need a program that can run the javascript code and take the newly generated data. But how do you do that? There is a library called Selenium which can do that for you (http://stanford.edu/~mgorkove/cgi-bin/rpython_tutorials/Scraping_a_Webpage_Rendered_by_Javascript_Using_Python.php).
After you get all the html code, it's just a matter of taking all the html code and creating a python script to scrape and store the data in a structure that works for your convenience. This is by far the trickiest and most consuming time part. A lot of debugging needed, and the html code you get will be very messy. But this is more or less the process.
Re: Webscrapper for NBA stats?
- makeready
- Senior
- Posts: 691
- And1: 1,222
- Joined: Dec 18, 2014
Re: Webscrapper for NBA stats?
this can probably do what you want
http://killersports.com/nba/query
really weird syntax and the documentation is all over the place so there's a bit of a learning curve
http://killersports.com/nba/query
really weird syntax and the documentation is all over the place so there's a bit of a learning curve
Return to Statistical Analysis