Guided Project 7: Clean and Analyze Employee Exit Surveys

Hello Everyone. I am glad to be back here :star_struck: after a long time off the platform. I hope everyone is having a great week.

Here is my attempt at the cleaning and analyzing exit surveys from the DETE and TAFE institutes. I conducted some analysis on the combined data, then went further to investigate probable variations between insights gained from each institute.

I will be happy to hear your thoughts and feedback as always.

Link to last mission screen:

Notebook file:
notebook.ipynb (1.3 MB)

Here is a link to the file on GitHub

Click here to view the jupyter notebook file in a new tab

4 Likes

Hi @israelogunmola

Thanks for sharing another great project with DQ community. The whole project looks awesome, the introduction is well managed, information background of the two datasets, the aim, the use of comments and the structure in general is very informing .

I also loved how you have manage to do the cleaning process, I think ‘tabulate’ library has played a big role in this. The way those outputs have been displayed and the explanations given after are so attractive and going forward tabulate library will be my next option any time I have to display pandas series.

You have also made some extra moves outside Dataquest’s guidelines which is very encouraging. while working on the same project, never thought of exploring position column, many facts have been exposed. Also , diving the analysis into two is a good move. You have managed to point out and visualize the details based on each institution (DETE and TAFE) and then generalized the analysis, keep it mate for the good work.

have got one suggestion;
Don’t you think when giving percentage in every columns with missing values or even giving the number of these columns , then the best approach is to work them out in codes other than just stating? This is advisable when dealing with a dataset with many columns like in our case. This will give you the exact information and not approximation, like after creating a helper function I realized that classification column less than 47% missing values but in your workings you indicated it as one of the columns with more than 50% missing data.
You can confirm this using the following codes

def missing_value_per(df):

    dic_column={}    
    missing_over_50_per=[]
    missing_over_20_per=[]
    missing_less_20_per=[]
    for columns in df.columns:
        dic_column[columns]=round(df[columns].isnull().sum()*100/len(df))
    for k,v in dic_column.items():
        if dic_column[k]>=50:
            missing_over_50_per.append(k)
        elif (dic_column[k]>=20 )and (dic_column[k]<50):
            missing_over_20_per.append(k)
        elif (dic_column[k]>0 )and (dic_column[k]<20):
            missing_less_20_per.append(k)

    print(f'\033[1mColumns with more than \033[31m50%\033[0m \033[1mmissing values\n\n\033[0m{missing_over_50_per}\n')
    print(f'\033[1mColumns with more than \033[31m20%(and<50%)\033[0m missing values\n\n\033[0m{missing_over_20_per}\n')
    print(f'\033[1mColumns with less than \033[31m20%\033[0m missing values\n\n\033[0m{missing_less_20_per}\n')
    
missing_value_per(dete_df)                                             
        

if you call the function with your dataset , you will achieve the following results;


You may consider this in your upcoming workings.

Otherwise from my side everything is well worked and managed .Congratulations buddy for the good work. Have learned a lot going through your work and I hope the same will apply to any learner who will go through the work, thanks for that :blush:

Thank you for the detailed feedback @brayanopiyo18 . It is evident that you went through every aspect of my work. Feedback like these are very valuable and I really appreciate the time you’ve put into sharing this with me.

I can totally see what you mean by working out the number of missing values in code. I totally missed this and will be sure to be more careful when dealing with similar situations in my analysis.

Once again, I am grateful for the feedback. I look forward to learning and contributing more in this community.

Cheers!

1 Like

This is great @israelogunmola

Happy learning!

1 Like

Great work. You have really carried out your analysis in depth. Here is a quick suggestion: Instead of always inserting a markdown table at each point I suggest you use the Series.value_counts() which also gives you the same information although not in tabular form.

1 Like

Really well done and I love the ease of reading and careful detail to presentation. I really like how you mapped the position column- I have never thought of using the mapping criteria (teaching roles) both within the function and within the mapping itself.

1 Like