Exploring Data Visualization-part5-comparing histograms

Screen Link:
https://app.dataquest.io/m/145/histograms-and-box-plots/6/quartiles

From the histograms, we can make the following observations:

  • Around 50% of user ratings from Fandango fall in the 2 to 4 score range
  • Around 50% of user ratings from Rotten Tomatoes fall in the 2 to 4 score range
  • Around 75% of the user ratings from Metacritic fall in the 2 to 4 score range
  • Around 90% of the user ratings from IMDB fall in the 2 to 4 score range

Can someone explain how the percentage is calculated from observing the histogram created .?
I am getting like around 90% of user ratings from Rotten Tomatoes fall in the range 2 to 4 instead of 50%.

Thanks in advance

1 Like

Hello! I looked through the lecture and I agree; it seems like it was just a rough estimate by looking at the charts. For the first bullet point, I went ahead and added the median to the chart. Once we added it, we can confirm the bullet point’s claims: minimum is between 2 and 3 and median point is roughly close to 4.

Screenshot_54

Saw your revised comments. So, it looks like this was the approach that they took: “We can visually examine the proportional area that the bars in the 2.0 to 4.0 range take up and determine that more than 50% of the movies on Fandango fall in this range.” So, the focus was only on the values that fall in those ranges (2 to 4). They visually looked at the values falling within those ranges and determined that it represented 50% of the data values.Screenshot_56
Screenshot_57

Hello there,
Thank you for your reply.I understood your explanation.
I was trying to find the percentage using the frequency count table.
the link for the page is below for reference.
https://app.dataquest.io/m/145/histograms-and-box-plots/3/binning

I was trying to find the count using that frequency distribution table .
where 3.5 -4.0 = 58. which makes the total count within 2 to 4 range to 136
Excluding that, the total count will be 78 same like the above output.
Also ,when i did this.

norm_reviews[‘RT_user_norm’].value_counts().sort_index()
the total count was 83. which brings the percentage around 50%.

Is this way of finding out the percentage correct?
Please ignore if this is irrevelant?
Thank you in advance.

Hey, sorry about the lag. It’d be better if you can provide some screenshot of your code so I can best help you with that. I think you may have forgotten to include an values less than two (see screenshot for reference). In any case, the way you used value counts is correct–not sure about sort index. See my screenshot for the approach I took towards calculating the that. Let me know if that helps.

Hello,
I too got the same value.ie,
Total count of Values Between 2 & 4 : 83
around 56 percentage of ratings .

I used sort_index to sort the ratings in ascending order .

Before,Where i went wrong was i tried to calculate the frequency count for RT_user_norm using the frequency table showed in HIstogram and boxplot partb 3: Binning.

Thus the calculation went wrong.

I appreciate your help.
Thank you so much.

Glad I could help :slight_smile: