---combined['dissatisfied'].value_counts(dropna=False)---
---False 285
True 226
False 85
True 55
Name: dissatisfied, dtype: int64---
I get 2 sets of boolean values
If I run fillna(False,inplace=True), output is False=651
When I run pivot table
---combined.pivot_table(index='service_cat',values='dissatisfied',aggfunc=[np.mean])---
I receive the following error,
---# reset the locs in the blocks to correspond to our
DataError: No numeric types to aggregate---
[Guided Project_ Clean And Analyze Employee Exit Surveys (4).tar|attachment](upload://epnmggIMl8mh0O7uRvrjDsE9Bhm.tar) (1.4 MB)
I canāt find a similar post, I am still searching for it though. It was raised by another student who was also facing double counts of True values. @Sahil suggested, looking for the Python version which might be creating such odd behavior.
Can you please cross check the Python version that you have and upgrade it?
The notebook that you have attached isnāt accessible from the post. Can you please re-upload the same.
I found the post with sinilare problem the solution was to set the fillna method to in place to get rid of the double counts, when I did this it converted all of values to one count of false
Was just going through your code. The double Boolean values are resulting because of two different approaches you have taken.
Code cell numbers are based post ārestart and run allā command in Jupyter NB.
in code cell 49, the function update_vals uses string values for Boolean, which is applied in code cell 50:
def update_vals(val):
if pd.isnull(val):
return np.nan
if val == '-':
return 'False'
else:
return 'True'
and in code cell 61 you have different code for DETE dataset, which takes Boolean as Boolean and not string: dete_resignations['dissatisfied'] = factors_job.any(axis=1,skipna=False)
This might also be the reason why the .pivot() method is not working correctly as it gets āTrueā and āFalseā in str format which it canāt convert to numeric values.
I tried using institute_service column in place of ādissatisfiedā and it gave me a pivot table: combined.pivot_table(index='service_cat',values='institute_service', aggfunc=np.mean)
I also donāt understand why are you using .any() method for DETE. That is resulting True for dissatisfied column even if the maternity or relocation or study/travel have been assigned as reasons for quit. Which as per the project requirement might yield us different results.
I compared the result using the last 5 records from DETE dataset:
dete_resignations.loc[:,'job_dissatisfaction':'workload'].tail()
compared with dete_resignations['dissatisfied'].tail()