Def. Functions Extract & Frequency Table Question

Screen Link: https://app.dataquest.io/m/315/functions%3A-fundamentals/7/creating-frequency-tables

Hi everyone,

I’m not confident enough to move forward but I tried writing out the solution and mapping out each section of the code to make sure I understand.

For some reason, writing it down on a piece of paper made it slightly simpler for me to understand.

Lines 1-7 is from the extract section of the function
Lines 8-16 is from the Freqency section of the function.

To the right are just notes I made so I can kind of look at it and read it and the understand it.

Can somebody explain the following points to me so that I can gain a better understand?

  1. Correct me if I’m wrong, but everything highlighted can be listed as anything. IE: index can be ABC, column can be XYZ, etc etc, but generally you want to make it a variable that describes the situation at hand.

  2. For line 3-4, I understand you’re doing a FOR LOOP to extract the data from apps_data and the [1:] is to start from first row of actual data, skipping the title. But can somebody explain to me What is occouring in line 4? for value = row[index] whenever I see that I get inttimidated despite kind of understanding that you’re setting a VARIABLE as value and that you’re taking the information from ROW(index) in this case as genre (index 11). Did I get it right here?

  3. Question is for the frequency_table. I understand most of it, and my questions are is the column variable here the same as the column variable for def extract? I’m assuming it’s different because it’s different functions right?

  4. This is kind of a refresher but lines 11-14 we’re doing a FOR loop again for frequency table this time for genre (index 11) which was set previously. But why yare we using frequency_table[value] here? Whenever I see that I just kind of blank out. IE: having [value] after frequency_table, what the heck is going on here?

  5. Last question, but when I generate the code and print(genres_ft) I get all the data as (GENRES,COUNT) which is what we want. Kind of a dumb question but how does result know how to show both the GENRES,COUNT and not just the COUNT? Is that automatic? I went through the previous lesson for frequency tables and I don’t think it was ever explained but I just want to understand how it works.

Thanks!

3 Likes

Hey @scchoi21, this is great. I liked writing out code by hand as well for notes, because then I could notate it more comfortably. I would print out longer code snippets and glue them into my notebook to write notes, too. :smiley: I’ll try to give some feedback to the best of my knowledge and hopefully you’ll feel like you’re able to move on.

  1. Looks like you got this one. :slight_smile:

  2. Yes, you’re right in how you’re understanding this part. You could have skipped this line and put column.append(row[index]). Whether you assign it to a variable depends on what you need to do with the value you’re extracting from the row. I like using a variable for readability, because I can look at a function later and know what row[index] was supposed to be used for. It’s also better when you have to use the value for a calculation, or when retrieving multiple values from the same row so you can differentiate between them.

  3. Just like in #1, column is just a placeholder for your input that will be used within the function. Even when you use return column, it’s not returning the value specifically with the name column. So in genres = extract(11), the output will be assigned to genres. (I hope that makes sense.)

  4. value is going to be a key in the frequency_table dictionary you build in the function. frequency_table[value] is going to be the value assigned to the key. (I think using value here as the iteration variable makes this a little confusing.) With frequency_table[value] +=1, we’re telling Python to find the key value in the frequency_table dictionary, and increment the value associated with it by 1.

  5. Since the ouput of freq_table() is a dictionary, genres_ft will be a dictionary. When you print it, it’s just printing everything that’s in there as 'key': value. If you want it to print just the keys (genres), you can use genres_ft.keys().

I hope any of that is helpful.

1 Like

Awesome, thanks for the answer again april.g. I keep thinking #5 is more lke SQL where you use the COUNT function and put a label onto the output (hence why I was a little confused).

Everything else is what I thought and is more confirmed by your answer thanks!

Hello all, I am a little bit lost at the moment. My main struggle is not with understanding the logic of the operations since I can understand loops and whe main functions for the commands, but when referencing the values needed I sometimes lose the track of what I should be doing.

def freq_table(column):
frequency_table = {}
for value in column:
if value in frequency_table:
frequency_table[value] += 1
else:
frequency_table[value] = 1
return frequency_table

genres_ft = freq_table(genres)

in the second part of the program, they use “column” in the variable when defining the function freq_table. I would understand that we need to use “genres” instead since it contains all the genres listed already (?) Is this possible?

I don’t fully understand why we “return column” at the end of the first function and then use it in the next function as an input variable instead of genres.