Hello DQ community,
I’m following along on the Investigating Fandango Movie Ratings project and am at the point where you realize that the datasets we have at our disposal are not representative due to their sampling methodology. At this point of the project you are supposed to either collect your own sample using webscraping or validate that the 2016-17 sample and the old sample at least follow the same methodology for collection. Specifically, we are supposed to determine that the 2016-2017 data is a stratification of “popular” movies with above 30 user reviews. To accomplish this, DQ tells us to take a random sample of the 2016-17 data and check Fandango’s site to see if these movies have meet the threshold.
The problem, at this point, is that Fandango no longer publishes their own movie ratings, and instead now only show Rotten Tomatoes scores. This creates a problem because the 2015 dataset was sampled using Fandango reviews (votes) counts, which is a metric that is no longer available. I could decide to use Rotten Tomatoes vote counts, but wouldn’t that put me in a similar predicament as I’m in now?
I know that a part of this project is to teach us what to do when we run into predicaments like this, but it seems like the accessibility to the data has changed too much? Would love to know your thoughts.