LIMITED TIME OFFER: 50% OFF OF PREMIUM WITH OUR ANNUAL PLAN (THAT'S $294 IN SAVINGS).
GET OFFER

Why I need to count as 1 if the iteration variable doesn't exist as a key in dictionary?

Screen Link: Learn data science with Python and R projects

My Code:
opened_file = open(‘AppleStore.csv’)
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

content_ratings = {}
for row in apps_data[1:]:
c_rating = row[10]
if c_rating in content_ratings:
content_ratings[c_rating]+= 1
else:
content_ratings[c_rating] = 1

print(content_ratings)

from what i understand, for instance, if ‘17+’ doesn’t exist in c_rating column, it shouldn’t be counted as 1…

That’s explained in the content -

You might wonder why we initialized (created) each dictionary key with a dictionary value of 1 instead of 0. When we encounter a content rating, we need to count it, no matter if it already exists or not as a dictionary key. When a rating that is not yet in the dictionary comes in, we need to both initialize it and count it. We need to initialize it with a value of 1 to mark the fact that this rating has already occurred once. If we initialized the dictionary key with a value of 0, we’d succeed in doing the initializing part, but fail to do the counting part.

Is any part of that confusing you?

Yes, I don’t understand why it needs to initialize it with a value of 1.

I think if it doesn’t exist in the dictorary, it shouldn’t be counted as 1.

If it still does not exist you should create a new key and initialize it with 1 because at this iteration you’ve encountered it.

If it doesn’t exist in the dictionary, then we create a key for it. But what about the value corresponding to that key?

Take a simple example. Let’s say

content_ratings = {"10+":2}
c_rating = "12+"

We do our check -

if c_rating in content_ratings:

That’s False. "12+" is not in content_ratings. So, we create a key for it. That means -

content_ratings = {"10+":2, "12+":0}

BUT, "12+" is a c_rating. It exists there, which means the number of ratings for "12+" is already 1. It can’t be 0 because it exists as a c_rating in your data.

So, content_ratings creates the key and sets it value to 1. That’s what they mean by the statement -

We need to initialize it with a value of 1 to mark the fact that this rating has already occurred once. If we initialized the dictionary key with a value of 0, we’d succeed in doing the initializing part, but fail to do the counting part.

You are not JUST initializing here. You are also counting the number of ratings that have occurred once.

In an even simpler example, let’s say that you have the following word -

word = "hello"

and you are trying to create a dictionary that stores the number of unique letters in that word above.

From looking at it, you can say that

letters = {"h":1, "e":1, "l":2, "o":1 }

You loop through each letter in word and create letters. When you loop through it, you will get "h" and store than in letters. And "h" is set to 1 because it already exists in word. It won’t be 0 because we are counting the number of times "h" appears in word.

Just like how "12+" is set to 1 and not 0 because of how many times it appears in our data.