Hi everyone!
I actually have two concerns regarding the Guided Project: Clean and Analyze Employee Exit Surveys
1st Concern
The first has to do with the code in this mission screen for Identifying Dissatisfied Employees. I went through @golden5mk’s post on the df.any()
method. Contrary to what he mentioned about the df.applymap()
affecting the behavior of the df.any()
, the case isn’t the same for me. At least not entirely.
When I input the following code:
def update_vals(val):
if val == '-':
return False
elif pd.isnull(val):
return np.nan
else:
return True
tafe_cols = ['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction']
tafe_resignations['dissatisfied'] = tafe_resignations[tafe_cols].applymap(update_vals).any(axis=1, skipna=False)
tafe_resignations['dissatisfied'].value_counts(dropna=False)
The following output gets displayed:
False 241
True 91
True 8
Name: dissatisfied, dtype: int64
I get a double count for True
where the second one should be NaN
. So the df.applymap()
does seem to have an effect but not as what @golden5mk mentioned. The corresponding values are the same as those in the solution notebook. I read somewhere that I should check my version of Python which happens to be the latest at 3.8.5.
I just want to know why the double count is happening and how I can address it.
2nd Concern
The second has to do with creating a pivot table for the initial analysis. There are a few posts raising similar concerns. In my case, I dropped all rows which had NaN
under the dissatisfied
column so that when I input the following code:
combined_updated['dissatisfied'].value_counts(dropna=False)
The output is:
False 372
True 226
Name: dissatisfied, dtype: int64
However, when I try creating a pivot table,
dissatisfied_service = combined_updated.pivot_table(values='dissatisfied',index='service_cat',margins=True)
I get a No numeric types to aggregate
error. It isn’t until I use the df.fillna()
method that my problem gets solved.
combined_updated['dissatisfied'] = combined_updated['dissatisfied'].fillna(False)
It’s very strange since, as can be seen above, there are no NaN
values to be replaced. There are only 372 False
values and 226 True
values. Also, when I input the df.value_counts()
method just like I did above after using the df.fillna()
method, the output is exactly the same for the False
and True
values. It’s like nothing happened but I’m somehow able to create a pivot table after.
I apologize for the lengthy post but this has been bothering me for quite some time already.
Here’s a copy of my project so you guys can check out the issues I just mentioned. Employee Exit Survey.ipynb (173.2 KB)
Hoping to get a response and thanks in advance for the help!
Click here to view the jupyter notebook file in a new tab