Sharing Solution to Guided Project - Clean and Analyze Employee Exit Survey

Would love for some constructive feedback, thank you
Solution :
Project mission :


hey @Raj

Thank you for sharing your guided project and Welcome to DataQuest! :smile:

A couple of things definitely caught my focus on your project. Without diving deep into the entire project, let’s discuss them:

  • Employee Type - I hadn’t thought of employee type as a criterion for analysis while completing my own project so this was new and additional for me. Thank you! :ok_hand:

  • IntervalIndex and Interval (cells 395 and 396) - could you briefly describe these methods/objects and their relevance in the project like what their purpose is and why you have utilized them. A markup explaining the general idea/ usage maybe. They seem quite interesting to learn! :nerd_face:

Although I tried them with dummy values for service years, they seem to not have considered the lower limit while classifying the service tenure. For example for values: [1, 15, 16, 3, 7, 9, 10] this was the result obtained:

and for values: [1, 15, 16, 4, 2, 9, 10] this was the result obtained:

Please consider few additions for your project:

  • descriptive markups - brief/ summarized explanation for the purpose or function of preceding or proceeding code.
  • formatting the title of a graph so that it itself gives what the plot is about.

Let us know, as to how DataQuest can help you further. :slight_smile:


Hey, @Rucha,
Thank you so much for the feedback, I will make the necessary changes and keep in mind to be more descriptive with my approach from here on.

Regarding the IntervalIndex and Interval functions, I faced a similar problem and hence used tuples as custom intervals to solve the discrepancies. There are still some that are mis-labeled by this process, which i was not able to solve a 100%, But i believed it was okay as the goal was to use a Proxy of some sort to understand the General trend.

Thank you :smile:

hey @Raj

I agree with the part that it’s not necessary to get absolute answers while we are learning.

Considering case in point, it’s okay to have some values represent as Nulls, as it might give us a different idea or help in understanding a particular method (approach - not a technical one!) to analyze and form conclusions.

However, we also need to think and learn about the skewness that might arise in data if we are too extreme with our approach.

Just as an example before we classify service years, we have about 90 null values in the institute_service column. After we apply the interval methods we get almost 300 or so null values, an effective 28% increase in a dataset with 650 or so total rows.

Although your results were proportional to the provided solution, I just wanted to highlight that it’s a loss of so much data.

Hope to see more innovative approaches/ ideas from you and learn together. :slight_smile:

1 Like

Hey @Rucha,

This was something i hadn’t given a though of or checked at that point of time. This is a wonderful observation, I shall try to find a way around this and in future remember to keep this aspect in mind.

Thank you so much for the help! :smile:

Real nice touch with the Waffle Plot in the end !
Very organised and easy to understand for even a newbie. Keep it up !

1 Like