Clean and Analyze Employee Exit Surveys with great care

Hello to everyone!.

I want to share with you the project of which I am very proud. Not only for the work he has given me but for the details I have learned, I consider it fundamental as a solid basis to start working on the projects of each one.

I’ve tried go a little further and start to mix things from past lessons and exploring new ways of show graphics, looking for the balance between what I had to explain and the spectacularity of the function library.

Another point that I wanted to work on, apart from programming and analysis has been he structure of the document in the best possible way. For people who are not initiated and it is easy for them to understand the steps I have been following. I hope I succeeded.

I leave in link to github since apparently nbviewer has a problem with the refresh of the cache and the changes are not immediate:

If anyone wants to comment on those things, please tell me about it, your point of view matters to me. :yum:

Step_03/Clean and Analyze Employee Exit Surveys/Guided Project Clean and Analyze Employee Exit Surveys DQ-anotaciones-End.ipynb

> here you can download it <

Thanks again.

A&E.

2 Likes

Hi @Edelberth! Thanks for sharing your nice project with the Community! It’s great that you’ve decided to use knowledge from the previous lessons, I’m certain you’ve gained a lot of knowledge and experience along the way:) It also demonstrated your proactive behavior which is much needed today:)

Here is my feedback:

  • Your introduction was copied from DQ. Could you come up with your own intro? You may also add some other questions in addition to those provided by DQ
  • You have some typos here and there
  • You can find the DETE exit survey data here here. - you lack the link and here is duplicated
  • It’s better to import all libraries in the first code cell. It certainly improves readability and gives an idea to the reader of which packages you used in the project. Also, no need to import them twice (or more): I was that you’d imported matplotlib.pyplot twice
  • Document your code. For example, you could briefly describe what the first function, missing_on_columns(df) does: what’s the input and the output! It’ll help future you and other readers of the code
  • In the overview of DATE, you identify the frequency of each data point in each column. pandas has a special function to do the same job. It’s also better to identify the columns of interest specific to the questions you want to answer. Maybe it’s better to write them down instead of assessing all the columns?
  • Do you think that the plots of the columns to drop are necessary? They are pretty clogged with the text
  • Many of your text is copied from DQ. Try paraphrasing it!
  • Some of your plots (like after [39]) don’t have axes labels that can confuse the readers. Could you also comment on them?
  • In Create a New Column it’s not clear why you’re creating a new column: state that it’s needed to find out for how many years employees have worked
  • We’ll use to categorize employees as “dissatisfied” from each dataframe. We’ll use what?
  • What are [42] and [43] for?
  • This function allows to maintain the NaN values and convert into True all the values that are not -. You should explain why you do this. Why do you need this function? It’s also better to rephrase the sentence by saying that you convert each - character into False
  • In [56] remove the comments and place each .str.replace on a new line to improve readability (see PEP-8). Anyway, I think you can achieve the same result without using regex:slight_smile: Find a way!
  • Where are the answers to the questions? Should the reader figure it out from the code (and if they are not technical specialists?)
  • You can merge age_groups(val) and age_cleaner(val) into one function
  • You then use pairplots on different age groups but only comment on the density plots. Do you really need the rest of them (scatter plots)? They are pretty confusing in my opinion. You also did not leave any comment on the age group 51-56 (I can see them further but I was confused at the beginning because you did not follow the previous pattern of commenting just under the plot). What are the DETE and TAFE stats for?
  • You then again change the pattern of commenting
  • You have a lot going on in Did more employees in the DETE survey or TAFE survey end their employment because they were dissatisfied in some way? but only left the comments after a lot of code. Is it necessary? Should you leave at least a brief comment after each finding?
  • Don’t forget about the conclusions! Sum up your project. In the real world, it’s probably the only part that will be read:)

I hope I was helpful. Happy coding :grinning_face_with_smiling_eyes:

2 Likes

Hi @artur.sannikov96 lad to see you again.

Thanks to you for taking the time and patience.

I will tell you a somthing:

The improvements that you recommended I could not implement in this exercise (yet!) because I finished it on Saturday, anyway I am glad to see that little by little I have been improving on my own in line with what you told me.

Responses to your feedback:

  • Introduction copied from DQ:
    -Yes, I do, at the time I had doubts about how I was going to present my works, whether with the appearance that I had been the creator (!) or be humble and show that this was a Guided project. In addition to that it helps me to study what is basically the final objective of these works at least for me, so I opted for the path of be clear and trust that the improvements I want to implement in github as you suggest are the ones that will really show how I started and where I am, that I also think is a way to give another kind of value to work.

-You have some typos here and there:

  • You are more right than a saint. True.

-You can find the DETE exit survey data here here. - you lack the link and here is duplicated:

  • no problem, that’s easy to fix

-**It’s better to import all libraries in the first code cell. It certainly improves readability and gives an idea to the reader of which packages you used in the project. Also, no need to import them twice (or more):

  • I was that you’d imported matplotlib.pyplot twice:**
    -True, however I was watching a video of the owner of DQ and it caught my attention that as I progressed I was loading the libraries and what I needed in each step I took, I liked the idea and I found it more authentic than without knowing what you are going to find put everything you already know you are going to use.
    -The reason why there was more than one import is that at first I started as you say and being a long time with the notebook and starting to grow I was not careful. The learning from this is that I must choose a style, at least that’s how I see it. What do you think?

-Document your code. For example, you could briefly describe what the first function, missing_on_columns(df) does: what’s the input and the output! It’ll help future you and other readers of the code:

  • Again, right. In fact I commented on the exercise with a girl from the forum and saw how within the function she put between ‘’’what function does’’’ , the truth is that at this point this is a very important point.

-In the overview of DATE, you identify the frequency of each data point in each column. pandas has a special function to do the same job. It’s also better to identify the columns of interest specific to the questions you want to answer. Maybe it’s better to write them down instead of assessing all the columns?

  • I didn’t really know. Can you tell me what that function is?

-Do you think that the plots of the columns to drop are necessary? They are pretty clogged with the text:

  • Well, I think the answer to your question is that it depends.
    It depends on the public to present this notebook or at least a notebook with the idea I wanted, which was to be the closest thing to those who do not know anything about the subject and thus strive for example in the creation of graphics like this.

  • We know for us it is not necessary but it was an excuse to touch topics that have happened and exercise one thing that I like more and more about this discipline and that is that it has a certain degree of creativity.

  • I begin to feel that the same notebook can be done in many ways and that gives me confidence in what I am learning.

Many of your text is copied from DQ. Try paraphrasing it!:

  • Yes, so I do to know exactly what is being asked of me in order to study the document in the future.

Some of your plots (like after [39]) don’t have axes labels that can confuse the readers. Could you also comment on them?

  • again true.

In Create a New Column it’s not clear why you’re creating a new column: state that it’s needed to find out for how many years employees have worked

We’ll use to categorize employees as “dissatisfied” from each dataframe. We’ll use what?*

  • I don’t understand this question, I don’t know what you mean, sorry.

What are [42] and [43] for?
-are two lines of test that snuck in…

**This function allows to maintain the NaN values and convert into True all the values that are not -*. You should explain why you do this. Why do you need this function? It’s also better to rephrase the sentence by saying that you convert each - character into False

  • In [56] remove the comments and place each .str.replace on a new line to improve readability (see PEP-8). Anyway, I think you can achieve the same result without using regex :slight_smile: Find a way!

    • yes, it is a good excuse to start working as you should.
  • Where are the answers to the questions? Should the reader figure it out from the code (and if they are not technical specialists?)

    • The idea was responding as I was doing, it but it seems that it is not the most appropriate, next time I will try to make some real worked conclusions, I will give a look at your work if you do not care.
  • You can merge age_groups(val) and age_cleaner(val) into one function

    • it’s true, I thought it was very ugly but I didn’t fall into it!

You then use pairplots on different age groups but only comment on the density plots. Do you really need the rest of them (scatter plots)?They are pretty confusing in my opinion. You also did not leave any comment on the age group 51-56 (I can see them further but I was confused at the beginning because you did not follow the previous pattern of commenting just under the plot). What are the DETE and TAFE stats for?
You then again change the pattern of commenting

  • Honestly I also thought about it, but the objective that I considered to be main was to touch a library of functions that I had not seen before, in fact I really want to finish the statistics module and know when it is time to do one thing or the other.

  • in relation to the comments when I was carrying a few I realized that it was not the best solution.

You have a lot going on in Did more employees in the DETE survey or TAFE survey end their employment because they were dissatisfied in some way? but only left the comments after a lot of code. Is it necessary? Should you leave at least a brief comment after each finding?

  • I hadn’t thought about it, I’ll give you a look to see how it goes.

Don’t forget about the conclusions! Sum up your project. In the real world, it’s probably the only part that will be read:)

  • Yes, after so many things that one is finding along the way it is very difficult for me to make a summary that covers everything, but yes, the conclusions are basic.

I thank you again for your sincerity :100:, you have made me see things that I had not seen.

I hope I can help you as much as you have helped me. :

A&E.

1 Like

When you are live-coding this can be True, but I still believe that putting all the imports in the first cell gives us a big picture of what’s used in the project (clearly you import libraries whenever the necessity arises).

Yep, choose your style and stick to it throughout the project.

It’s value_counts().

Can you improve the plots style, then?

In this sentence, use implies a noun.

Happy coding!

1 Like