Sum spent amount based on customer id

The below code is from the solution. I dont understand the code. we learned df.groupby('columns')['columns'].agg(np.sum)below code, there is no specific clarification saying to sum which column, then how did python to sum value?

Thank you!!

Screen Link: https://app.dataquest.io/m/466/fuzzy-language-in-data-science/5/aggregate-data-by-customer

My Code:

best_churn["nr_of_transactions"] = group_by_customer.size()
best_churn["amount_spent"] = group_by_customer.sum()
2 Likes

@candiceliu93

When you use df.groupby('columns'), you get a table like this, for example:

customer_id  trans_date trans_amount
A                     11/11/ 11      10
A                     20/11/11        35
A                     30/11/11        100

You can use the groups to see the groups and the get_group(name) to get a particular group. These are the ways to see the table above because you can’t view groupby without using aggregate function.

print(group_by_customer.groups)
group_by_customer.get_group('FM8448')

With group_by_customer.size() you are calculating the number of rows that a particular group occupies. In my table above, A has 3 rows.

With group_by_customer.sum() you are summing trans_amount for each group. The sum() adds up all eligible numerical columns.

For example:

df = [{'id': 'A', 'value': 3, 'amount': 5},
      {'id': 'B', 'value': 2, 'amount': 4},
      {'id': 'A', 'value': 2, 'amount': 3},
      {'id': 'A', 'value': 0, 'amount': 7},
      {'id': 'B', 'value': 1, 'amount': 5}]

import pandas as pd
pd.DataFrame(df)

image

image

5 Likes

got it! so python will work on numeric columns if we use sum() even though we did not clarify which columns we need to add it up.

For the size(), it will count the rows based on the columns we group by…correct?

3 Likes

Yes.

The size() counts the number of rows for a particular group.

2 Likes

Hi, what if there are more than 1 numeric columns? Do I need to specify the column I want?

1 Like

No. It calculates it for you.

1 Like

so if the table has two numeric columns, like this:

df = [{‘id’: ‘A’, ‘value’: 3, ‘amount’: 5, ‘amount_2’: 10},
{‘id’: ‘B’, ‘value’: 2, ‘amount’: 4, ‘amount_2’: 15},
{‘id’: ‘A’, ‘value’: 2, ‘amount’: 3, ‘amount_2’: 12},
{‘id’: ‘A’, ‘value’: 0, ‘amount’: 7, ‘amount_2’: 13},
{‘id’: ‘B’, ‘value’: 1, ‘amount’: 5, ‘amount_2’: 10}]

then I group it by id and want to assign one of the aggregated value to a column, how will it perform? or I should ask: what should I do if I want to assign one of the aggregated value to a column?

best_churn[“amount_spent”] = df.groupby(‘id’).sum()

Thank you

image

image

image

image

image

The trick is setting the index and then resetting the index,

1 Like