Detailed explanation of the solution

Screen Link: https://app.dataquest.io/m/315/functions%3A-fundamentals/7/creating-frequency-tables
Hey, I’ve been progressing through the course fundamentals and I really cant seem to understand the solution to this question. Would be amazing if someone could briefly explain the solution code step by step. Thanks

Hi @sheikhhabib29
Im in the same boat as you unfortunately. :frowning:

Dear dataquest team,
Could you please provide guidance on where I am going wrong here in my code? I’ve gotten an error message saying that genres_ft is shorter than we expected. Hence, Im sure that there is a mistake somewhere in my code but I cant find where. Please helpp!

CODE FROM THE PREVIOUS SCREEN

opened_file = open(‘AppleStore.csv’)
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

def extract(userrating):
emptycolumn =
for row in apps_data[1:]:
rating = row[7]
emptycolumn.append(rating)
return emptycolumn

genres = extract(11)

def freq_table(emptycolumn):
frequency_table = {}
for rating in emptycolumn:
if rating in frequency_table:
frequency_table[rating] += 1
else:
frequency_table[rating] = 1
return frequency_table

genres_ft = freq_table(genres)

Hi @joanna.george. The idea behind creating functions is so that we can reuse them, thus preventing us from having to repeat code over and over. In order for us to be able to do this, we want to make the parts we want to change as the input variables, and keep the code.

Let’s take a look at your extract() function.

def extract(userrating):  
    emptycolumn = []     
    for row in apps_data[1:]:  
        rating = row[7]                                    
        emptycolumn.append(rating)  
    return emptycolumn

This should be the same as what you wrote to pass the previous screen. The function is supposed to take in an index (which represents a column in the dataset) and create a separate list from each row with only that index. This is essentially isolating one whole column. In your function, you’ve called the input userrating, but you aren’t using it at all in the function. Instead, you have rating = row[7] and are appending those to the list. All this function is doing now is, regardless of the input, going to return the column at index 7 (the user_rating column).

So then when we call the function with genres = extract(11), the input of 11 (the index of the prime_genre column) isn’t used at all. Instead we’re going to get the user_rating column. Then when the freq_table() function runs on this column, the results are much shorter because there are fewer unique entries than the prime_genre column.

Once you fix the error in the extract() function, the code should pass.

I hope that helps.

3 Likes

Thank you veryy much for your explanation April. Really appreciate it!

Hey,
I’m getting the same issue, this is my code, if anyone has any insight, I would be very grateful. Thank you!

opened_file = open(‘AppleStore.csv’)
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

def extract(index):
column =
for row in apps_data[1:]:
value = row[index]
column.append(value)
return column

genres = extract(11)
print(genres)

def freq_table(column):
frequency_table = {}
for value in column:
if value in frequency_table:
frequency_table[value] += 1
else:
frequency_table[value] = 1
return frequency_table
genres_ft = freq_table(genres)

hi @mooneyemily8

Could you please elaborate on the error you are getting. I tried your code and it works fine.

Also can you please select and use the Preformatted text option so as to identify any syntax errors if present in your code. Thanks.

image

the result code would be something like this:

# some code above
read_file = reader(opened_file)
apps_data = list(read_file)

def extract(index):
    column = []
    for row in apps_data[1:]:
        value = row[index]
        # some code here
    return column
# CODE FROM THE PREVIOUS SCREEN
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

def extract(index):
    column = []
    for row in apps_data[1:]:
        value = row[index]
        column.append(value) 
    return column 

genres = extract(11)

def freq_table(column):
    frequency_table = {}
    for value in column:
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
        return frequency_table
genres_ft = freq_table(genres)

The error I’m getting is genres_ft is shorter than expected. Thank you

1 Like

hey @mooneyemily8

its’ a classic indentation problem. the preformatted text trick worked!

this return statement is inside the if block. you need to correct the indentation.

1 Like

Oh my gosh, Thank you so much!

2 Likes

Hi! do you know why is the last code : genres_ft = freq_table(genre) ? because in the question they asked:" generate the frequency table for the prime_genre column". So, it should be genres_ft = freq_table(prime_genre)

1 Like

Hi @hongchi0502, welcome to the community!

Earlier in the code, we used the extract() function to get all of the elements in the prime_genre column (index 11) and saved it to the variable genres. The freq_table() function is expecting the column data as input, which in this case is coming from genres. That’s why we used it instead of prime_genre. If you use prime_genre, the function won’t know what to do with it because it doesn’t know to get the data from that column by the header name.

I hope that helps!

2 Likes

@april.g Thank you very much, I had previously been stuck on this section for more than 30 minutes!