Guided Project: Clean And Analyze Employee Exit Surveys 10/11

Screen Link:

My Code:

Guidance is appreciated:

---combined['dissatisfied'].value_counts(dropna=False)---
---False    285
True     226
False     85
True      55
Name: dissatisfied, dtype: int64---

I get 2 sets of boolean values 
If I run fillna(False,inplace=True), output is False=651

When I run pivot table

---combined.pivot_table(index='service_cat',values='dissatisfied',aggfunc=[np.mean])---

 I receive the following error,
---# reset the locs in the blocks to correspond to our

DataError: No numeric types to aggregate---

[Guided Project_ Clean And Analyze Employee Exit Surveys (4).tar|attachment](upload://epnmggIMl8mh0O7uRvrjDsE9Bhm.tar) (1.4 MB) 

What I expected to happen:

What actually happened:

Replace this line with the output/error

hey @neilgordonwalker

I can’t find a similar post, I am still searching for it though. It was raised by another student who was also facing double counts of True values. @Sahil suggested, looking for the Python version which might be creating such odd behavior.

Can you please cross check the Python version that you have and upgrade it?

The notebook that you have attached isn’t accessible from the post. Can you please re-upload the same.

hello rucha

please find norebook attachedGuided Project_ Clean And Analyze Employee Exit Surveys (4).tar (1.4 MB)

I found the post with sinilare problem the solution was to set the fillna method to in place to get rid of the double counts, when I did this it converted all of values to one count of false

hey @neilgordonwalker

Was just going through your code. The double Boolean values are resulting because of two different approaches you have taken.

Code cell numbers are based post “restart and run all” command in Jupyter NB.

  1. in code cell 49, the function update_vals uses string values for Boolean, which is applied in code cell 50:

     def update_vals(val): 
     if pd.isnull(val):
         return np.nan
     if val == '-':
         return 'False'
     else:
         return 'True'
    
  2. and in code cell 61 you have different code for DETE dataset, which takes Boolean as Boolean and not string:
    dete_resignations['dissatisfied'] = factors_job.any(axis=1,skipna=False)

This might also be the reason why the .pivot() method is not working correctly as it gets ‘True’ and ‘False’ in str format which it can’t convert to numeric values.

I tried using institute_service column in place of “dissatisfied” and it gave me a pivot table:
combined.pivot_table(index='service_cat',values='institute_service', aggfunc=np.mean)

Hi @neilgordonwalker

I also don’t understand why are you using .any() method for DETE. That is resulting True for dissatisfied column even if the maternity or relocation or study/travel have been assigned as reasons for quit. Which as per the project requirement might yield us different results.

I compared the result using the last 5 records from DETE dataset:

dete_resignations.loc[:,'job_dissatisfaction':'workload'].tail()
compared with
dete_resignations['dissatisfied'].tail()

So in plain english, this means I have included columns from DETE that are not required?

Yup. You have included other reasons as well as dissatisfied.

Thanks for the guidance Rucha

1 Like