Trouble with the any() fan

Screen Link:
https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/7/identify-dissatisfied-employees

My any() function is resulting in True for every single column, tafe_resignations_up’s first row for example has no Trues but I still got True for my dissatisfaction column.

What did I do wrong?

1 Like

Hi! It’s a bug of pandas version 1.0.1. All you need to do is to update the version of pandas you are using on your PC.
You can find some guides how to do it in the solution answer for the following topic: True instead of NaN in jupyter notebook

2 Likes

Hi!
I’ve been wondering for a while if your trouble with .any() was solved, so decided to check the thread.
And I’ve found out that my initial answer was somewhat precipitate. I didn´t check your code and trusted my (and of many other DQ peers) experience with .any() function while working on this project. Which doesn´t guarantee anyway that you are not going to face the bug I mentioned later on :wink:
So, I´ve checked your code, and it seems that I´ve found an error in the cell 606:
dete_resignations['dissatisfaction'] = dete_resignations.any(axis=1, skipna=False) tafe_resignations['dissatisfaction'] = tafe_resignations.any(axis=1, skipna=False)
You apply .any() to the whole data frame and not only to the columns we are interested in (like ‘Contributing Factors. Dissatisfaction’, ‘Contributing Factors. Job Dissatisfaction’ for TAFE dataframe). It results in True for every row because it takes in all the columns. And if you remember when you try to do any boolean operation with a non-boolean value, any value different from null value and 0 is considered as True.
One more time, I´m sorry for having mislead you and hope it´s not too late for my reviewed answer.

Hey! I realized my mistake after posting this. Thanks for checking in though!

Hi @ksenia.kustanovich

I completely agree to this point.

But my question is related with selecting the columns.

How is it possible to select multiple multiple columns in one go?
For example if I have to select df.iloc[:, 10:15] plus [:, 23:27] and [:, 29:35] in a single operation and apply the any() function how can I do it?

I searched to find some solution, but as of now none has worked. But I am still trying.

Just figured out that I can use this

df.iloc[:,np.r_[10:15,23:27,29:35]]

df.iloc[:,np.r_[10:15,23:27,29:35]].any(axis=1, skipna=False)

1 Like

That´s exactly what I was going to propose you, but you’ve already found your way out as well :slight_smile:

1 Like