Guided Project: Clean And Analyze Employee Exit Surveys: [Step 10] Combined Pivot Table Mean Results

Screen Link:

Hi All:

I’m not understanding the values returned from the combined_updated.pivot_table() call.
Hoping that someone can shed some light on the numbers for me.

My Code:

My combined_updated dataframe is as expected and seems to match values in the other project solutions that I have reviewed.

new            193
experienced    172
veteran        136
NaN             88
established     62
Name: service_cat, dtype: int64
pivot_table_combined_mean = \

What I expected to happen:

The pivot_table returns mean values but I am not understanding how these values were calculated. Again these results seem to match the results returned in other solutions to this project.

I’ve added an extra column to the result showing what I think the returned mean values should be.

There are 650 rows in combined_updated so the mean should just be the sum of the number of elements in each category divided by 650 and the grand sum of all the means should be 1 if we include the 88 NaNs in the dataframe.

The mean values for experienced, established and veteran all appear to be wrong to me.

Any enlightenment is appreciated! :slight_smile:

service_cat Mean My Expected Results
new 0.295337 193 / 650 = 0.29692
experienced 0.343023 172 / 650 = 0.26461
established 0.516129 62 / 650 = 0.09538
veteran 0.485294 136 / 650 = 0.20923
------ ------ ------
sum 1.639783 ??? 135 / 563 / 650 = 0.86615

By default the aggfunc argument is mean, hence pivot_table will return a DataFrame with the mean values of the values argument.

Think of pivot_table as a way to group and aggregate data in your DataFrame.

Read more in the documentation.

1 Like

Thanks for your feedback!

I figured out how the pivot_table() mean works now. :slight_smile:

I was using the wrong numbers for my sanity check.

The mean is calculated on the total number of elements for each specific category and not the total number of elements in the dataframe.

# total counts of all service categories with dissatisfied == True 
dissatisfied_true = combined_updated[ combined_updated['dissatisfied'] == True]

veteran        66
new            57
experienced    59
established    32
NaN            26
Name: service_cat, dtype: int64

# total counts of all service categories (True or False) 

veteran        136
new            193
experienced    172
established     62
NaN             88
Name: service_cat, dtype: int64

Using the above numbers for my sanity check gives the correct mean values

service_cat Mean Correct Mean Calculation
new 0.295337 57 / 193
experienced 0.343023 59 / 172
established 0.516129 32 / 62
veteran 0.485294 66 / 136

Hi @mgj000
your problem is solved?

I got a different number when i used the pd.pivot_table, I refer to the solution provided by DQ. but i dont understand the answer they provided.
I think your logic is clear…we want to know how many employees resigned due to dissatisfaction, which means we only need to calculate "True’ value in the dissatisfied columns. I think it is correct. what do you think about the answer provided by DQ?

The DQ solution is correct.

My original problem was one of not understanding how the mean values are calculated.

If you are not getting the correct mean values then I would sanity check your work prior to this point to see why you are not getting the same numbers.

One thing to check is that the totals count for all service categories where dissatisied == true are matched what you see above.

Good luck!