I seem to be stuck on the following bullet point
I want to understand how to check whether the sample contains enough popular movies. I sneaked a peek at the solution. It wasn’t clear to me how they obtained the data? Any help will be greatly appreciated. Thank you.
Welcome to the DQ community!
What this task is trying to get at is to check if there are enough popular movies to make it worth while. If we look at the value counts for
fandango_previous['Year'].value_counts() we see the there are 129 cases for 2015 but only 17 cases for 2014.
We see a similar patter for
fandango_after['year'].value_counts() as well. Where the 2016 has 191 cases whereas 2017 only has 23 cases.
Because there are so few popular movies in our sample of 2014 and 2017 any analysis we try to obtain from it is probably not going to be representative of the movie ratings. For this reason, it’s probably just best to stick to the years 2015 and 2016.
I think the next course Statistics Intermediate: Averages and Variability will help clarify this issue for you as well. The course goes over some specifics of why it’s important to have a large enough sample and some of the risks of small sample sizes.
I hope this helps!
Thank you! It does help clarify the issue. However, I’m a bit confused about how “popular movies” are defined (more than 30 fan ratings). The solution provides the table below to justify that one of our data set (fandango_after) indeed has “popular movies”
I’m curious about the
Fan ratings variable in the table as it is not present in the data that has been provided. I couldn’t find it with the fandango API (maybe I missed something?)
Hickey does talk about it (briefly) in his article on why he chose 30 as a sample size.
Here is another article that talks about sample size and what is the minimum recommended size.
Does this answer your question?
Could you please explain from where you got the ‘Fan ratings’ for these popular movies? These ratings are not available in the original database.
Thanks in advance.
It’s documented here.
Fan ratings is the number of fans that rated the movie.
I am sorry. Still it is not clear. For example, for the movie ‘Warcraft’ the ‘Fan ratings’ is 7271. From where is this value taken? This was my question. I don’t find the source for this information. Did I miss something?
Thanks in advance.
To solve part of this guided project you need to go to Fandango’s website and check movie ratings from one of the datasets in order to decide whether a movie is popular or not. Problem is that Fandango doesn’t have their own rating system anymore, they use Rotten Tomatoes, that makes it impossible to solve the guided project without using the solutions file.
@probot Thank you for that piece of information.