Python related confusion

Screen Link:

My Code:

opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

content_ratings = {'4+':0, '9+':0, '12+':0, '17+':0}

#if c_rating in content_ratings:
for c_rating in apps_data[1:][10]:
#    c_rating = row[10]
    if c_rating in content_ratings:
#        print(c_rating, 'exists as a key')
        content_ratings[c_rating] += 1

What I expected to happen:

{‘4+’: 4433, ‘9+’: 987, ‘12+’: 1155, ‘17+’: 622}

What actually happened:

{'4+': 0, '9+': 1, '12+': 0, '17+': 0}

You may want to see this.

1 Like

The link @monorienaghogho so graciously provided discusses the situation where keys are not being initialized properly and therefore cannot be incremented by 1 in the conditional section: if c_rating in content_ratings. Since you have initialized your dictionary with all possible keys and set each of them to 0, and while it is a wonderful post and has some great insights, I do not believe this post will help solve your particular problem.

Since there was no Screen Link to the mission you’re working on, I tested your code using some dummy data and it worked great! Here is what I used:

from csv import reader
content_ratings = {'4+':0, '9+':0, '12+':0, '17+':0}
data = ['4+', '17+', '12+', '4+', '12+', '9+', '9+', '12+', '4+', '4+', '12+', '17+', '17+', '9+', '4+', '9+']

for c_rating in data:
    if c_rating in content_ratings:
        content_ratings[c_rating] += 1

Output: {'4+': 5, '9+': 4, '12+': 4, '17+': 3}

Then I looked at your output and it seemed a little odd:

It managed to find one instance of '9+' but none of the others?! Ok, that’s just weird! However, it does give us a hint at what’s going wrong. It’s almost like your for loop is only testing one piece of data…and wouldn’t ya know it…it is only only testing one piece of data! Let’s look at why by inspecting this line here:

for c_rating in apps_data[1:][10]

What does this line do? It creates a variable c_rating that will loop over the iterable apps_data[1:][10]. So what is apps_data[1:][10] exactly? Well, since apps_data is a list of lists, by slicing it with [1:] we still have a list of lists but when you tack on that last [10] it gives you the list at index 10…which represents just one app, specifically the app Subway Surfers. (I found the link to the mission by following @monorienaghogho’s link and using the tags there to recreate the URL).

So now we know that your c_rating variable is looping over this list:

['512939461', 'Subway Surfers', '156038144', 'USD', '0.0', '706110', '97', '4.5', '4.0', '1.72.1', '9+', 'Games', '38', '5', '1', '1']

which explains why only your '9+' key was being found just once!

Does this make sense? Do you see how you can modify your for loop so that it loops over the correct app data? Let me know if you need more help and we can try something else.

1 Like

Thank you very much for your super clear explanation @mathmike314. Actually, I was wondering how to loop the 10th column of each row. I know there is an easier way like this:
for row in apps_data[1:]:
c_rating = row[10]
Do you think we can call the 10th column of each row in a single line (instead of defining one extra variable “row” and then assigning the 10th column of this to c_rating)?

Thank you very much for your responce.

1 Like

Unfortunately, since the data structure we are using is a list of lists, there is no way to cleanly select a “column” without using a loop. We could use a list comprehension (I’m a fan of these) but that’s still using a loop that iterates over the entire list. That would look something like this:

col_10 = [row[10] for row in apps_data[1:]]
for c_rating in col_10: #you could combine these two lines if you wanted

Or would could do this nifty little “hack” that transposes a list of lists (ie swap rows for columns) and then just select a row to iterate over:

cols_to_rows = list(zip(*apps_data[1:]))
col_10 = cols_to_rows[10]
for c_rating in col_10: #you could combine these three lines but it's not as readable

But behind the scenes, both of these involve looping over the entire dataset and doesn’t really save on computing much.

Now, if the data were stored in a pandas dataframe, that would be a lot easier and faster! Even a numpy array would afford us more options…but a list of lists? Sorry, we have to loop over it at some point in order to extract an entire column.

1 Like