Fuzzy language in Data science - 5

Hello everyone,

I have a problem with the answer to getting the number of transactions for each customer. In the groupby documentation, the size method when applies on a dataframe gives the number of rows multiply with the number of columns. I think when we grouped a dataframe based on a column, each group is a dataframe individually, so applying the size attribute on each group should give us the number of objects in the dataframes. if we want the number of transactions, we can use the first element of the shape attribute. Am I wrong?

Screen Link: https://app.dataquest.io/m/466/fuzzy-language-in-data-science/5/aggregate-data-by-customer

My Code:

best_churn['nr_of_transactions'] = group_by_customer.apply(lambda x: x.shape[0])

I appreciate it if you could guide me with this problem.

It’s not clear to me what problem you’re having.


I mean we write the below code:

group_by_customer = data.groupby(“customer_id”)
best_churn[“nr_of_transactions”] = group_by_customer.size()

It actually gives us the number of total objects (rows * columns) in each grouped dataframe.

But we are looking for just the number of rows.

I think I don’t understand what “size” attribute does when we apply it on each grouped dataframe.

Please review this screen.

1 Like