Guided Project Clean and Analyze Employee Exit Survey

Hi Community

Here’s another project from the updated courses on dataquest. I am using the guided project on Cleaning and Analyzing employee survey exit data.

Please take a look at my work and let me know improvements. Any constructive feedback will be appreciated.

Many thanks in advance

https://app.dataquest.io/c/60/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/11/next-steps

GM Guided Project_ Clean and Analyze Employee Exit Surveys.ipynb (140.2 KB)

GM Guided Project_ Clean and Analyze Employee Exit Surveys.py (13.4 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hello @gmpay1! Thanks for sharing your project with the Community:) I liked that you thoroughly described each step.

However, here are my suggestions:

  • Title the project with a more appealing name. It’s meant also for people out of DataQuest. They do not know what Guided Project means!
  • You should avoid the space between print and what is printed. It’s not recommended if you want to follow Pythonic style guidelines
  • Leave out technical subheadings and squeeze them under a “Data Cleaning” (or similar) heading. You may leave the most important things you’ve done as level-3 heading (like, “Identify Missing Values and Drop Unnecessary Columns”)
  • Don’t print out long lists of output by the value_counts() and info() methods. I believe, they only create a distraction
  • In the section “New Column Names”, it’s not clear when you did this renaming process. It took me some time to understand, that it was done before. You did the same in "Combine the dataframes. Try to be more linear
  • In code cell [18] you’ve complicated your life so much! Try to write a function that does the age categorization. You’ll that it’s much easier than doing this manually
  • You drop all columns where the number of NaN values is greater than 500 but give no explanation of why you did it
  • Use plt.show() to avoid the printing of the additional information (like matplotlib.axes....)
  • Title the plots, it’s not clear what they are
  • Also make sure that you label all the axes to make it clear what are the numbers on them
  • Order the categories on the “service_cat” plot in a logical way (from “New” to “Veteran”)
  • It’s not necessary to use %matplotlib inline two times
  • Provide a short summary at the end of the project. Have you been able to answer the questions? What are the results?

That’s it for me. Happy coding :grinning_face_with_smiling_eyes:

Thanks! It is very insightful, and I appreciate very much the time you took to analyze the file.
I’ll try again soon.