Explanation required

Hi,

I’ve been really enjoying doing the practice problems, but I’ve come across one that I haven’t fully been able to understand.

This is the question:

  1. Create a function, avg_group , with the following features:
  2. The first argument is a dictionary that follows the schema {column_name: {index: value}} .
  3. The second argument is one of the string keys in the first argument.
  4. Returns a new dictionary where:
  • They keys are the values in the entry of the first argument at the second argument.
  • The values are lists containing the averages for total_bill , tip and size in this order.

And this is the answer, along with comments about what’s confusing me:

def get_avgs(d, value, indices):# what is the purpose of the indices parameter
    sums = [0 for _ in range(3)]
    for idx in indices: 
        sums[0] += d["total_bill"][idx]
        sums[1] += d["tip"][idx]
        sums[2] += d["size"][idx]
    return list(map(lambda x: x/len(indices), sums))

def avg_group(d, col):                    
    data = d[col]                        # | Whats is the purpose of these 3 lines of code
    groups = list(set(d[col].values()))  # |
    groups.sort()                        # |
    
    group_by = {}
    
    for key in groups:
        indices = [k for k,v in d[col].items() if v == key] #what is the purpose of this
        group_by[key] = get_avgs(d, key, indices)
            
    return group_by

Screen Link:
https://app.dataquest.io/m/1020/dictionaries/7/average-by-group

For a dictionary, {column_name: {index: value}}, how would you access value?

Given the following -

Think about what’s happening above given the question I asked you above.

They decided to store d[col] into a variable data but then never used data anywhere and continued to directly use d[col]. You can ignore this line.

groups = list(set(d[col].values()))

This is a convenient, concise way to extract the nested keys in the nested dictionary. For example,

d = {
     'sex': {69: 'Male', 103: 'Female', 84: 'Male', 207: 'Male', 0: 'Female'}
    }

d[col] where col is sex would return -

{69: 'Male', 103: 'Female', 84: 'Male', 207: 'Male', 0: 'Female'}

Then, since we want -

returns a new dictionary where they keys are the values in the given column name

we need to extract the values from d[col] such they can be used as keys. That’s what the following does -

d[col].values()

The above returns -

dict_values([‘Female’, ‘Male’, ‘Male’, ‘Male’, ‘Female’])

Now, how do you get the unique values from above (since you need the as keys)? That’s where set comes in. sets don’t store duplicate values. So, if you add anything to a set or you convert any container to a set, it will remove the duplicate values.

So, set(d[col].values()) returns -

{‘Female’, ‘Male’}

And then you convert the above set into a list, list(set(d[col].values())) -

[‘Female’, ‘Male’]

.sort() sorts the items in the list in ascending or lexicographic (alphabetical) order. Based on the instructions, I don’t see a particular need for sorting them actually.

1 Like

Thanks for your help!