# Calculating average values from a df column based on values from other column

Screen Link: Learn data science with Python and R projects

I have been trying to figure out how the average for the `casual` and `registered` were calculated in relation to the `workingday` column. The following screenshot is from the mission.

My Code:

I tried to write it as a boolean array to which I would apply the Series/DataFrame.mean() method. Then I tried doing Series.sum() method but it did not work.

This is for my pure curiosity

``````bike_sharing["workingday"].value_counts()
``````

Here I know that there are 231 non-working days in the data set.
Then I tried

``````bike_sharing["casual"].sum() / 231
``````

But the answer is not even close.

What am I missing here in order to calculate the average values like they are in the table?

Hi @vallentin.ciocirlan I was just about to shutdown the computer for the day and saw your question which piqued my curiousity as well!

I havenâ€™t quite figured out how to code it yet but I believe the necessary strategy would be to group the data by workingday and non-working day and then take an average on the `casual` and `registered` columns.

Iâ€™m thinking: add another column with boolean values for `is_a_work_day` and use that to filter your data and get the averages you see in the table.

Maybe??

FIRST EDIT: I see now that the dataset already has this column! (`workingday`)

So in the end, my strategy appears to have been sound!

Iâ€™m not sure if youâ€™ve reached the section on data aggregation yet but here is how DQ generated that table with one line of code:

``````print(bike_sharing.groupby('workingday').agg(np.mean)[['casual', 'registered']])
``````

LAST EDIT:
Here is how you would find these averages without using data aggregation (ie by filtering the data and then finding the meanâ€¦which is code you have definitely seen so far)

``````print(bike_sharing[bike_sharing['workingday']==0][['casual', 'registered']].mean())
print(bike_sharing[bike_sharing['workingday']==1][['casual', 'registered']].mean())
``````

1 Like

Yess oh gosh, thank you @mathmike314!!

I was intuitively looking last night to use a boolean array to get this, I knew somehow it had to be a boolean array but where I stumbled myself was in which order should I pass the boolean.

My initial thoughts were:

``````bike_sharing[bike_sharing["workingday"] ==  0, "casual"].mean()
``````

But I was getting a boolean dtype list with indexes from 1 to 730.

How should one think about the chaining method? What comes first? When can I use it?

EDIT: Not sure that the code you provided (see below) can be considered boolean indexing, so I apologize for the wrong terminology

I believe this is still considered boolean indexing because `bike_sharing["workingday"] == 0` does create a boolean array which we then use to index `bike_sharing`.

Iâ€™m not sure what the â€ścorrect answerâ€ť is here but I always like to filter the rows first (`bike_sharing[bike_sharing['workingday']==0]`) and then select the columns (`[['casual', 'registered']]`).

Thatâ€™s a very good initial thought and it would have worked if you were to use `.loc` indexing like so:

``````bike_sharing.loc[bike_sharing["workingday"] ==  0, "casual"].mean()
``````
1 Like

Wow, that is a big changer!
Thanks a lot!