Fandango ratings project with budget data scraped from Wiki - feedback please

Another one bites the dust: having done the original Fandango project rather fast (and without much passion) I’ve thought about a way to make it more interesting: MONEY
Surely movie budget is an important factor in the rating! The movie distributor may also pull some strings and there are distribution companies in different sizes…
I’ve used BeautifulSoup to scrape the budget data from Wikipedia, and performed further analysis.

  • figuring out the scraping function was a challange - the URLs on Wikipedia are not that standard - had to scrape the search results page first to get the url to the movies page, then scrape the budget (made a notebook on this here)
  • I’ll merge the scraping functions into 1 function in the future (update:done)
  • I imagine API may be a faster solution for scraping, haven’t tried, it’s on the list I’ll do it with another proj (this one was about scraping data from a website)
  • I’ll probably have to polish the observations a touch more, always leave it for the last thing

Have a look and please feel free to criticise

Project on Github

fandango2.ipynb (1.3 MB)

