# Sum spent amount based on customer id

The below code is from the solution. I dont understand the code. we learned `df.groupby('columns')['columns'].agg(np.sum)`below code, there is no specific clarification saying to sum which column, then how did python to sum value?

Thank you!!

My Code:

``````best_churn["nr_of_transactions"] = group_by_customer.size()
best_churn["amount_spent"] = group_by_customer.sum()
``````
2 Likes

@candiceliu93

When you use `df.groupby('columns')`, you get a table like this, for example:

``````customer_id  trans_date trans_amount
A                     11/11/ 11      10
A                     20/11/11        35
A                     30/11/11        100
``````

You can use the `groups` to see the groups and the `get_group(name)` to get a particular group. These are the ways to see the table above because you can’t view `groupby` without using aggregate function.

``````print(group_by_customer.groups)
group_by_customer.get_group('FM8448')
``````

With `group_by_customer.size()` you are calculating the `number of rows` that a particular group occupies. In my table above, A has 3 rows.

With `group_by_customer.sum()` you are summing `trans_amount` for each group. The `sum()` adds up all eligible numerical columns.

For example:

``````df = [{'id': 'A', 'value': 3, 'amount': 5},
{'id': 'B', 'value': 2, 'amount': 4},
{'id': 'A', 'value': 2, 'amount': 3},
{'id': 'A', 'value': 0, 'amount': 7},
{'id': 'B', 'value': 1, 'amount': 5}]

import pandas as pd
pd.DataFrame(df)
``````  5 Likes

got it! so python will work on numeric columns if we use `sum()` even though we did not clarify which columns we need to add it up.

For the `size()`, it will count the rows based on the columns we group by…correct?

3 Likes

Yes.

The `size()` counts the number of rows for a particular group.

2 Likes

Hi, what if there are more than 1 numeric columns? Do I need to specify the column I want?

1 Like

No. It calculates it for you.

1 Like

so if the table has two numeric columns, like this:

df = [{‘id’: ‘A’, ‘value’: 3, ‘amount’: 5, ‘amount_2’: 10},
{‘id’: ‘B’, ‘value’: 2, ‘amount’: 4, ‘amount_2’: 15},
{‘id’: ‘A’, ‘value’: 2, ‘amount’: 3, ‘amount_2’: 12},
{‘id’: ‘A’, ‘value’: 0, ‘amount’: 7, ‘amount_2’: 13},
{‘id’: ‘B’, ‘value’: 1, ‘amount’: 5, ‘amount_2’: 10}]

then I group it by id and want to assign one of the aggregated value to a column, how will it perform? or I should ask: what should I do if I want to assign one of the aggregated value to a column?

best_churn[“amount_spent”] = df.groupby(‘id’).sum()

Thank you     The trick is setting the index and then resetting the index,

1 Like