Page 7 of Dictionaries and Frequency Tables


In order to understand exactly what I am doing rather than mindlessly inputing the instructions, I’m trying to figure out what is happening conceptually. In other words, what does the code that I’m writing actually mean?

Here are the instructions:

Count the number of times each unique content rating occurs in the data set.

  1. Create a dictionary named content_ratings where the keys are the unique content ratings and the values are all 0 (the values of 0 are temporary at this point, and they’ll be updated).
  2. Loop through the apps_data list of lists. Make sure you don’t include the header row. For each iteration of the loop:
  • Assign the content rating value to a variable named c_rating. The content rating is at index number 10 in each row.
  • Check whether c_rating exists as a key in content_ratings. If it exists, then increment the dictionary value at that key by 1 (the key is equivalent to the value stored in c_rating).
  1. Outside the loop, print content_ratings to check whether the counting worked as expected.

Here is the answer (I unfortunately deleted the incorrect code that I originally wrote before checking the answer so I can’t provide it this time):

content_ratings = {'4+':0, '9+':0, '12+':0, '17+':0}

for row in apps_data[1:]:
    c_rating = row[10]
    if c_rating in content_ratings:
        content_ratings[c_rating] += 1

For the part of the code that is in bold (content_ratings[c_rating] += 1), what does that mean? I’ve been pouring over that code and its corresponding instructions and it just isn’t clicking. Thanks for the help.

Break it down -

  • What is content_ratings?
  • What is c_rating?
  • Based on the above two, what does content_ratings[c_rating] result in?
  • What is the += operation?

Think of the above and then go back to the instructions -

Try to relate the two. Then let me know what is confusing you after that.

so if I understand correctly, c_rating is simply a variable that represents the content ratings, while content_ratings is the dictionary created earlier. I would understand “content_ratings[c_rating] += 1” as meaning “if the content ratings such as ‘4+’, etc. can be found in the dictionary labeled content_ratings, then increase the dictionary value by 1.”

I don’t understand how increasing the dictionary value by 1 resulted in the output being equal to the number of ratings for each respective content rating? For example, if you look at the output, it shows {‘4+’: 4433, ‘9+’: 987, ‘12+’: 1155, ‘17+’: 622}. So how did we get 4433, for example, if we just increased the value by 1? Thanks.

Because the update happens inside your for loop. For each iteration of the loop, if c_rating is a key in the dictionary content_ratings, then the value corresponding to that key is updated by 1.

If you want a default behaviour, use defaultdict from collections. defaultdict returns default value when no key is found in the dictionary.

from collections import defaultdict 

# default value of 0 when no key is found in dictionary
content_ratings = defaultdict(int)

for row in apps_data[1:]:
     content_ratings[row[10]] += 1 

If you need a non-zero default value,

#default value = -1 when no key is found. 
content_ratings = defaultdict(lambda: -1)