Webscrapping IMDB

Hi everyone,

Hope you all are doing great.
Its been a long while that I’ve logged in here. Kind of missed having great interactions here. Now that I’m interested in a data related project, I am back here to seek advice and help.

I want to get data related to a specific regional language movies. I believe IMDB is the best place to find a comprehensive data. But what is worrying me is the legal aspect around scrapping data. I’ve checked the robot.txt of Imbd and nothing is allowed to scrape I believe.

I am still a NO expert in data analytics nor web scrapping. But I can figure out how to get the data that I want.

But my questions are

  • will there be an issue if I scrap data from imbd as it is probably not legally allowed?
  • is there any other place that I should look?
  • anyone want to join this small project of mine?

Also I have downloaded the data dump in tsv format and I have no idea how to navigate through that.
Looking forward to your smart ideas. Thank you so much for your time.

Hi @jithins123 and welcome back to the community!

Although I haven’t done the web scraping lessons yet (I’m really looking forward to gaining that skill!) I did a bit of digging on IMDB and found this link where you can download their data directly (and legally) without having to do any scraping.

Is this the “data dump in tsv format” that you mention? I haven’t looked at any of these files but from what I read, they seem pretty straight forward… What have you seen that makes it difficult to navigate?

I know there are others here who have done personal projects with IMDB data that might better advise you: I’m thinking about @michael.hoang17 and @Rucha in particular.

Sorry I’m not of more help.

1 Like

Thank you @mathmike314 for your reply and for taking time to have a look at imdb. I have downloaded from the link you shared. Well, the difficulty is not knowing how to go about get the dataset that I want from these multiple files. I need to brush up my knowledge. I’ll get back to it and update you in the coming days

1 Like

Thank you, yes, please keep me informed!

From what I saw, depending on your data needs, you may need to merge/join some of these files together using a common column so that rows match up. If you search for “how to merge pandas dataframes,” you should find a ton of information on how to do that.

Best of luck and let me know how you make out.