I actually have two concerns regarding the Guided Project: Clean and Analyze Employee Exit Surveys
The first has to do with the code in this mission screen for Identifying Dissatisfied Employees. I went through @golden5mk’s post on the
df.any() method. Contrary to what he mentioned about the
df.applymap() affecting the behavior of the
df.any(), the case isn’t the same for me. At least not entirely.
When I input the following code:
def update_vals(val): if val == '-': return False elif pd.isnull(val): return np.nan else: return True tafe_cols = ['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction'] tafe_resignations['dissatisfied'] = tafe_resignations[tafe_cols].applymap(update_vals).any(axis=1, skipna=False) tafe_resignations['dissatisfied'].value_counts(dropna=False)
The following output gets displayed:
False 241 True 91 True 8 Name: dissatisfied, dtype: int64
I get a double count for
True where the second one should be
NaN. So the
df.applymap() does seem to have an effect but not as what @golden5mk mentioned. The corresponding values are the same as those in the solution notebook. I read somewhere that I should check my version of Python which happens to be the latest at 3.8.5.
I just want to know why the double count is happening and how I can address it.
The second has to do with creating a pivot table for the initial analysis. There are a few posts raising similar concerns. In my case, I dropped all rows which had
NaN under the
dissatisfied column so that when I input the following code:
The output is:
False 372 True 226 Name: dissatisfied, dtype: int64
However, when I try creating a pivot table,
dissatisfied_service = combined_updated.pivot_table(values='dissatisfied',index='service_cat',margins=True)
I get a
No numeric types to aggregate error. It isn’t until I use the
df.fillna() method that my problem gets solved.
combined_updated['dissatisfied'] = combined_updated['dissatisfied'].fillna(False)
It’s very strange since, as can be seen above, there are no
NaN values to be replaced. There are only 372
False values and 226
True values. Also, when I input the
df.value_counts() method just like I did above after using the
df.fillna() method, the output is exactly the same for the
True values. It’s like nothing happened but I’m somehow able to create a pivot table after.
I apologize for the lengthy post but this has been bothering me for quite some time already.
Here’s a copy of my project so you guys can check out the issues I just mentioned. Employee Exit Survey.ipynb (173.2 KB)
Hoping to get a response and thanks in advance for the help!
Click here to view the jupyter notebook file in a new tab