I got different result from other students

I got different numbers from others. How did it happen? The code is the same, the data set is the same. I got 0.77 4for established, others got 0.516. I check the dataset, it is same as others until this step. Please help, thank you so much!!

Screen Link: https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/10/perform-initial-analysis

My Code:

resignation_per=combined_updated.pivot_table(index='service_cat',values='dissatisfied').reset_index()
print(resignation_per)

What I expected to happen:

What actually happened:

   service_cat  dissatisfied
0  Established      0.774194
1  Experienced      0.581395
2          New      0.476684
3      Veteran      0.808824



dissatisfied
service_cat	
established	0.516129
experienced	0.343023
new	0.295337
unknown	0.295455
veteran	0.485294

hi @candiceliu93

Same data set can be worked with multiple ways. There could be other students who have may also have totally different result.

It’s quiet difficult to really know your treatment of the data and the other student(s) work you are comparing with without first analyzing the difference.

Can you elaborate further on how you compared your own project with others.

following points are just for example:

  • cleanup strategy used by you and the reference project - like fillna(), replacements or row drops etc.
  • any step where you deviated from based on your own ideas (as in you wanted to add or modify from the given instructions)
  • have you tried to execute the results of the referenced project in your machine or you are just referencing to the shared project in nbviewer?

Please attach your own project to help understand what could be so different.

Thanks.

Hi @Rucha

I can’t find the project I referred to, so let me change the way I asked the question.

In the below code, the dissatisfied column has True and False value, the pivot table returns the mean values by default. but how did python know to calculate only True value here? the output I got is:

0 Established 0.774194
1 Experienced 0.581395
2 New 0.476684
3 Veteran 0.808824

Example:
Total number of dissatisfied columns is 651. 193 rows belong to the New group, only 92 rows are True, I expect the return is 92/651=0.1413 because we did not tell python only look at the True value and calculate percentage. It seems that python did 92/193=0.476684.
combined_updated.pivot_table(index=‘service_cat’,values=‘dissatisfied’)``` and got:

resignation_per=combined_updated.pivot_table(index='service_cat',values='dissatisfied').reset_index()
print(resignation_per)```

hi @candiceliu93

please attach your solution notebook to help understand what’s going on here.

Even if your code is taking up True values and calculating % of dissatisfied employees for each segment, segment-wise, we can say it’s different yes but it still makes sense and gives a correct result as in - x many employees resigned due to dissatisfaction among the total y employees who resigned. (total y included True and False, x is only True).

Provided that is the case. For my own project the True and False values for New is different than yours so I can’t test with my own.
P.S. The more questions I am answering about this project the more Bonkers I am going! :confused:

1 Like

Hahaha! Understand your situation.
Please see my jupyter notebook below. Thank you!!

screen link:https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/11/next-steps

The Analysis Of Employees Resignations-Basics (1).ipynb (397.3 KB)

Click here to view the jupyter notebook file in a new tab

hi @candiceliu93

Let’s try this one workaround.

In code cell 25, you have this code:

dete_resignations['dissatisfied']=dete_resignations[dete_col].applymap(update_vals).any(axis=1,skipna=False)

Please comment this line and perform a “restart & run all cells” option on your project. Don’t change any thing else for now.

If you see the output for this very cell, DETE shows all employees i.e. 311 as dissatisfied whereas my project shows False 162 and True 149.

Let’s see if this the root cause of your trouble!

I performed “restart and rull all cells”, dete has 311, tafe still has True 91, False 241, NaN 8. Nothing changed.

hey @candiceliu93

you have to comment that DETE code. We don’t need to to apply the True / False method to this dataset. As it already has that taken care of. Please read my post again.

I commented out DETE code. Error shows “KeyError: ‘dissatisfied’”, then I also tried comment out all DETE code in the cell. then only tafe shows True 91, False 241, NaN 8. No other things changed.

The error “KeyError: ‘dissatisfied’”arrised ´cause in the last line of this cell you try to print the dete['dissatisfied'] value counts. The problem is that by commenting out the line indicated by @Rucha no 'dissatisfied'column was created for dete dataframe. What you need to do is to eliminate the .applymap(update_vals) from this line and apply only .any() method. The update_vals is not applied for DETE because the values in the columns you mention are already only True or False while in TAFE columns they are True, False or -.

So, the line in question should remain as follows:
dete_resignations[‘dissatisfied’]=dete_resignations[dete_col].any(axis=1,skipna=False)

1 Like

Thank you for you reply!

I tried to comment out all DETE code in that cell and tried your way too. The number of True and False still did not change.

Step1: “restart & run all cells” so that cell numbers match.

Step2: Change in code cell 25:

Step3: “restart & run all cells” again

Step4: check the result for code cell 36:

just a heads up - you may get an error for age column later on.

Now it works!! Thank you.

But why we dont need to ues applymap(update_vals) to dete dateset?

Check my message above to answer your last question.