Datasource for TV/Movie Scripts

Does anyone know where one could find a data source for tv shows/movie scripts where one can look for certain keywords. Thinking about creating a trigger warning dataset, initially focused around child/infant/prenatal(miscarriage) loss potentially expanding to other topics, like suicide, sexual violence etc.

Or does anyone know of a dataset that already does this.

Any tips on how to proceed, I can already see some stumbling blocks like false positives in text search
ie miscarriage of justice

Hi @Giles.Day

I think that’s an awesome idea and I wish you good luck on your project
Found a dataset on Kaggle with over 2000 movie scripts, they are on the scripts folder

Also this two articles have different websites where you can download them manually

On how to proceed I think you are going to need an extensive training set with phrases or text that contains the topics that you choose. Then you can use something like VADER to get a sentiment analysis for each phrase in your training set. For the testing set, I think that in the approach that I’m taking, you might need to break the scripts into phrases so you can do the sentiment analysis and maybe get an average of the score of each phrase in the script.

The only problem with my approach is that it’s going to consume a lot of computational power

Anyway, hope this helps and good luck

1 Like

Thanks, it sounds a bit more advanced then where I’m at right now. but stretch goals are a good thing.