GP Employee Exit Surveys for review

Hello to everyone,

it took me an eternity to finish the project. Mostly by my initiative to deepen a bit in the visualization part of the analysis. Hope it has worthed it.

It will be of a great help if you coud mark what you like in the project and what could be done to make it better.

Here´s a link to the guidance: https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/11/next-steps

And here´s my project:
Clean and Analyze Employee Exit Surveys.ipynb (632.6 KB)

Click here to view the jupyter notebook file in a new tab

5 Likes

It was amazing to read your project, the graphs are a huge improve it make the results clearer. I saw some pandas methods and functions that i didn’t know before. Nice one, keep the good work

Good luck!

2 Likes

Thank you!
Frankly speaking, I used quite some functions and methods that I hadn’t known before. I was just looking for similar problems at Stackoverflow and adjusting the solutions to my case and checking the documentation to get a better understanding of the proposed solutions. All of the above takes time, that’s why sometimes it seemed that I was not going to finish the project ever😂

Hi @ksenia.kustanovich! I like the graph you made. It looks so professional!!
I have a difficultly in understanding the code you do for the graph. Hope you can interpret it … thank you!!

1,`cat_abs = combined.pivot_table(index='service_cat',values='dissatisfied', aggfunc=lambda x: len(x)).reset_index()` What does `lambda x: len(x)`mean here?

2,I like the way you show the annotation on the bar chart.

``````for p in ax.patches:
width, height = p.get_width(), p.get_height()
x, y = p. get_xy()
ax.annotate('{:.1%}'.format(height), (x + 0.5 * width, y + height + 0.01), size=13, ha='center')```I dont understand this code, especially ```ax.annotate('{:.1%}'.format(height), (x + 0.5 * width, y + height + 0.01), size=13, ha='center')```, could you help on explaining it? Thank you so much!!``````
1 Like

Hi, @candiceliu93! Thank you for your questions. They made me revise my code one more time and I´ve found that some parts are unnecessary complicated and it´s possible to do it simplier and nicer.

1. `lambda` is an operator that allow you to create custom functions inline. In this case it takes the `dissatisfied` column for each category as `x` and calculates the length of each series. I´ve seen it used as `aggfunc` in one of the shared projects here and used it as well. But I´ve just tried it in my notebook and found out that it´s possible to change the complicated `lambda x: len(x)` to a simple `len` (well-known `len()` function) and the results are the same.
2. Let´s see it line by line:
`for p in ax.patches:` - iterating over all the bars of the plot
`width, height = p.get_width(), p.get_height()` - getting the width and the height of each bar. The height is equal to the values used for plotting.
`x, y = p. get_xy()` - getting the coordinates of the left bottom corner of each bar
`ax.annotate('{:.1%}'.format(height), (x + 0.5 * width, y + height + 0.01), size=13, ha='center')` - annotating the values on each bar, where `'{:.1%}'.format(height)` is the format of numbers (with one digit after comma and the % sign, i.e. 42,3%), `(x + 0.5 * width, y + height + 0.01)` are the coordinates where the annotations are placed, `size=13` is the font size and the `ha='center'` is the horizontal alingment of the text. Before the project I didn´t know how to annotate the values on the plot; but I googled it and found various solutions on Stackoverflow and chose this one as a simpliest and readable.

Thank you for your detailed explanation!! It solved part of questions from me.

1,`lambda` you explained it well! Noted `lambda x: len(x)` works as well.
2, `(x + 0.5 * width, y + height + 0.01)` are the coordinates where the annotations are placed—What do 0.5 and 0.01 mean here? and why we need to do multiply width? can you give me a example of one bar?

Thank you so much!!

1. yeah, `lambda` works well, but now I think it´s unnecessary ´cause `aggfunc=len` works as well and is simplier to understand.
2. `x` and `y` are retireved with the .get_xy() method in the previous line. They are the coordinates of left bottom corner of each bar. I want the annotations be centrically located above each bar. That´s why I add half of the width to the `x`. If I hadn´t, the text would be placed just on the left side of each bar. In order to locate the text above the bar we have to add the height to the coordinate of its bottom corner, `y`. I add 0.01 as well ´cause I want the text to be a little bit separated from the bar.
You could think that `ha` parameter served to center the annotation on the bar. But´s it´s not quite true. Imagine that the coordinates we pass as a dot on the plot. The annotation text occupies more space than a dot. When we pass `ha` as `center` the text is centered regarding this dot, if we passed it as `left` the text would be moved to the right from the dot so the dot represented it´s left bottom corner.

Just copy this part to one of your graphs (I saw some similar graphs in your project) and try to play a little bit with the coordinates and the `ha` parameter for better understanding.

I´ve changed the `ha` parameter to the `left` on one of the plots in order to demonstrate the difference to you:

The imaginary dot is still located in the middle of the bar, but the text goes right after it.

After I watched your samples, i try to try some new code on my sample. you inspired me! haha…

Thank you for your explanation ,it is so detailed!!

1 Like

Absolutely amazing analysis and visualisation make a huge difference for ease of report consumption.

1 Like

Thank you! With every guided project completed I realize that I´ve got a soft spot for data visualisation. Usually it takes significant efforts, especially time, to customize the aesthetics. etc., but I really enjoy the results later, when you get all the necessary information at a first sight on the chart.

1 Like