Dictionaries practice, problem #7 Average by groups: Alternative solution (without using sets)

Screen Link:

I was surprised to see that the provided solution uses Sets, which have not been taught at this point in the curriculum. Here is my initial solution, without sets:

# Aux. method: computes the average bill or tip or size for a list of rows
def get_avgs(dic, col, indices):
    
    inner_dic = dic[col]
    total = 0
    
    for index in indices:
        total += inner_dic[index]
   
    return total / len(indices) 

# Main method
def avg_group(dic, col):    # col is either 'sex', or 'smoker', or 'size'
    
    inner_dic = dic[col]    # contains the values for the column
    result_dic = {}         # this is the end-goal result of the method

    # Extract the keys of the result dictionary from the values in the column
    for index, value in inner_dic.items():
        if value not in result_dic:
            result_dic[value] = []

    # Calculate the 3 averages for each key, and add them as values
    for key, averages in result_dic.items():
        relevant_indices = []      # the rows needed to calculate the averages
        for index, value in inner_dic.items():
            if value == key:
                relevant_indices.append(index)
        averages.append(get_avgs(dic, 'total_bill', relevant_indices))
        averages.append(get_avgs(dic, 'tip', relevant_indices))
        averages.append(get_avgs(dic, 'size', relevant_indices))
        
    return result_dic

Sets make it easy to sort the keys in the end-result dictionary. In my solution, the keys are not sorted (it still passes the submission test since dictionaries are intrinsically unordered).

My outcome:

print(avg_group(d, "sex"))
{'Male': [23.24, 2.373333333333333, 2.6666666666666665], 'Female': [19.705, 2.245, 2.0]}

Outcome of solution with sets:

print(avg_group(d, "sex"))
{'Female': [19.705, 2.245, 2.0], 'Male': [23.24, 2.373333333333333, 2.6666666666666665]}

But sets are not taught in the Python Fundamentals course, and are only introduced in the series of practice problems that follows this problem. So how come they are already used in the solution to this problem? I wish there was at least a hint… could have saved myself a one hour long headache.

If that is indeed the case, then please use the ? in the top-right corner of the Mission Step page and give this feedback to them. They can then look into it accordingly.

set has O(1) time complexity for checking if an element belongs in this set. However, we can remove the step to check if the each row belongs to this group by value through preprocessing of data. That is, set is not needed for avg_group.

We can preprocess all the rows belonging to each column’s distinct group by value. Then perform average calculation based on the preprocess data. Hence, removing the need to check every single time we access each row data.

from collections import defaultdict
def avg_group(d, column):
    
    # Preprocess to arrange row belongs to this group by value 
    belong_to_group = defaultdict(list)
    for index, group_by_value in d[column].items():
        belong_to_group[group_by_value].append(index)
    
    # values are index for result structure 
    columns = {"total_bill":0, "tip":1, "size":2}
    
    # Compute average for the above columns for each group by values 
    group_by = dict()
    for group_by_value, records in belong_to_group.items():
        n = [0, 0, 0]
        total = [0, 0, 0]
        for row in records:
            for column_to_average, array_index in columns.items():
                total[array_index] += d[column_to_average][row]
                n[array_index] += 1
        group_by[group_by_value] = [x/y for x, y in zip(total, n)]
    return group_by
d = {
    'total_bill': {69: 15.01, 103: 22.42, 84: 15.98, 207: 38.73, 0: 16.99},
    'tip': {69: 2.09, 103: 3.48, 84: 2.03, 207: 3.0, 0: 1.01},
    'sex': {69: 'Male', 103: 'Female', 84: 'Male', 207: 'Male', 0: 'Female'},
    'smoker': {69: 'Yes', 103: 'Yes', 84: 'No', 207: 'Yes', 0: 'No'},
    'day': {69: 'Sat', 103: 'Sat', 84: 'Thur', 207: 'Sat', 0: 'Sun'},
    'time': {69: 'Dinner', 103: 'Dinner',
             84: 'Lunch', 207: 'Dinner', 0: 'Dinner'},
    'size': {69: 2, 103: 2, 84: 2, 207: 4, 0: 2}
}
print(avg_group(d, "sex"))
Output
{'Male': [23.24, 2.373333333333333, 2.6666666666666665], 'Female': [19.705, 2.245, 2.0]}