Help me! I am stuck and the provided answer is out of my understanding

Screen Link:

https://app.dataquest.io/m/1020/dictionaries/7/average-by-group
My Code:

def avg_group(d,col):
    d1={}
    d1=d[col]
    d2={}
    avg_tb=0
    avg_tip=0
    avg_size=0
    flipped={}
    
    for k,v in d1.items():
        if v not in flipped: 
            flipped[v] = [k]
        else: 
            flipped[v].append(k)
            
        avg_list=[]
        ave_list=[]
            
    for i in flipped:
        ave_list = flipped.get(i)
        for a in ave_list:
            avg_tb+=d["total_bill"][a]
            avg_tip+=d["tip"][a]
            avg_size+= d["size"][a]
        avg_list.append(avg_tb/(len(ave_list)))
        avg_list.append(avg_tip/(len(ave_list)))
        avg_list.append(avg_size/(len(ave_list)))
        d2[i]=avg_list
    return d2

What I expected to happen:
to application confirm the method works right

What actually happened:

Function avg_group did not return the expected value.
Replace this line with the output/error
Function avg_group did not return the expected value.

I am stuck in this practice. I wonder why the provided answers are in much higher level than what I learnt here. I even do not understand what does it do. I appreciate if you provide a description along with the answers so I can learn much more and understand the code.

1 Like

@Sahil, Hello Sahil, I see your name but I do not see any changes or comment. I am new so I might not know how this thing work :slight_smile:

Hi @massoudrmz ,
since you already saw the answer, I will stick to it.

You mention that the answer is at a “higher level” but actually is only a cleaner code. The level is related to the language itself and not to the code.
I suppose you are familiarized with C code or similar, since you are initializing almost all variables before using them. In Python you can forget about this most of the time. For example, instead of:

d1={}
d1=d[col]

Just do:

d1=d[col]

Another point from your code is in lines:

avg_list=[]
ave_list=[]

Did you noted that those lines are indented with the for loop? You are re-initializing both lists for each loop. Indentation in Python is an important point. And later, you’re re-defining the list ave_list:

ave_list = flipped.get(i)

Then you don’t need the previous initialization. You never use this list as empty list, different from avg_list.

Regarding the answer, I will give you some insights in the code hoping it helps you understand it better:

The line:

groups = list(set(d[col].values()))

is using set, which will drop any duplicated value. Then creates the list groups to iterate from. These are the keys.

While iterating over keys, the line:

indices = [k for k,v in d[col].items() if v == key]

uses a list comprehension. I recommend to practice a lot with it, it is a great code-saving technique.

The line above is the same as:

indices = []
for k,v in d[col].items():
    if v==key
        indices.append(k)

I hope this clarifies the answer for you but let me know if it doesn’t.

2 Likes

@massoudrmz

When you reach the lands of pandas, it’s much more bearable.

import pandas as pd

def avg_group(d, col):
    df = pd.DataFrame(d)
    ans = df.groupby([col])[['total_bill','tip','size']].mean()
    return ans.T.to_dict(orient='list')

For now, these exercises enhance your low level debugging skills. When higher level libraries in future hide what they do, you will need it

this is my solution. i agree with you that the solution provided is too high level. i did mine without using double function, lambda and map. should be easier to comprehend.

def avg_group(d,col):
    new_d={}
    new_d_keylist=[]
    for c,nd in d.items():

        for k,v in nd.items():
            if v not in new_d_keylist and c==col:
                new_d_keylist.append(v) 
        
    for each in new_d_keylist:
        
        totalbill_list=[]
        tip_list=[]
        size_list=[]
        
        for c,nd in d.items():
            for k,v in nd.items():    
                if each==v:
                    
                    totalbill_list.append(d["total_bill"][k])
                    tip_list.append(d["tip"][k])
                    size_list.append(d["size"][k])
                    
                    total_bill_avg=sum(totalbill_list)/len(totalbill_list)
                    tip_avg=sum(tip_list)/len(tip_list)
                    size_avg=sum(size_list)/len(size_list)
                    
                    new_d[each]=[total_bill_avg,tip_avg,size_avg]
    
    return new_d
2 Likes

Thanks. I was really struggling trying to figure this out!

1 Like

Hi, on the line of return, why is there the ‘.T’ in the middle of DataFrame.to_dict(orient=‘list’) ? I couldn’t find the documentation for that and I don’t understand why it doesn’t work without it.

Congrats on hitting the land of no docs in pandas. There are lot’s of opportunity to contribute to pandas.

It means transpose.
There’s not always help around, to handle this situation, you probably understand .T is a method called on a dataframe.
You can create your own dataframe https://kanoki.org/2019/11/18/how-to-create-dataframe-for-testing/ and attach the T to it and see.

Another strategy is you know pandas is based on numpy, if pandas doesn’t write it, you can expect numpy to have it: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.T.html
This is similar to how https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.interpolate.html asks you to read scipy docs on how interpolation is done.

Such tweaking and observing output usually works, but if it fails you can open source code.

  1. Google and find the source code of dataframe: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py

  2. Ctrl+f def T, it’s on line 2836

  3. Use github feature to jump around a repo by clicking on the method to jump to definition
    image

  4. Jump to def transpose and realize it has proper docs in it’s docstring. (Official docs are generated by sphinx from these docstrings)

  5. Google df.transpose to go https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html and read from a better UI

Google the error message, else open source code of DataFrame.to_dict(orient='list') and trace

2 Likes

I’m not even sure where to start with my confusion on this problem. I’ve spent two hours trying to understand it. I’ve checked the answer (which just made things worse), the hint was useless. This is pretty far from anything we’ve learned up to this point. I don’t even know how to start asking questions to understand this. Is there some super basic, naive way to do this first?

okay so we never learned set(). How were we supposed to know that was an option?

Also we never go over comprehensions in the course. I was able to piece them together from the previous answers and Google, but why aren’t comps explained in detail throughout the course?

I mean what is this? sums = [0 for _ in range(3)] We haven’t done anything like this? I had no idea _ was a valid variable! Or even appropriate for python.

return list(map(lambda x: x/len(indices), sums)) Are you freakin’ kidding me?! I think we brushed on lambda but it’s such a weird concept it deserves a whole section if you’re planning to include it in a “practice” problem.

I can’t even begin to tell how incredibly discouraging this problem has been. It made me feel like I knew absolutely nothing. And I’ve been at this for a year! I feel like I have too many questions to even ask. I’d need a tutor or something. It would’ve been really nice to build up to these concepts a little more. Man. I just had to say something. Help? I think :slightly_frowning_face: