Screen Link:
https://app.dataquest.io/jupyter/notebooks/notebook/Employee%20Exit%20Surveys.ipynb
My Code:
import pandas as pd
import numpy as np
dete_survey = pd.read_csv("dete_survey.csv")
tafe_survey = pd.read_csv("tafe_survey.csv")
dete_survey.info()
tafe_survey.info()
dete_survey.head()
tafe_survey.head()
dete_survey.isnull()
tafe_survey.isnull()
dete_survey.isnull().sum()
tafe_survey.isnull().sum()
dete_survey = pd.read_csv("dete_survey.csv", na_values='Not Stated')
dete_survey.columns
tafe_survey.columns
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49], axis=1)
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66], axis=1)
dete_survey_updated.columns = dete_survey_updated.columns.str.lower().str.strip().str.replace(' ','_')
dete_survey_updated.columns
tafe_survey_updated.columns
tafe_survey_updated = tafe_survey_updated.rename({"Record ID":"id", "CESSATION YEAR":"cease_date", "Reason for ceasing employment":"separationtype", "Gender. What is your Gender?":"gender", "CurrentAge. Current Age":"age", "Employment Type. Employment Type":"employment_status", "Classification. Classification":"position", "LengthofServiceOverall. Overall Length of Service at Institute (in years)":"institute_service", "LengthofServiceCurrent. Length of Service at current workplace (in years)":"role_service"}, axis=1)
tafe_survey_updated.columns
dete_survey_updated['separationtype'].value_counts()
dete_survey_updated['separationtype'].unique()
#Update all separationtypes with the word 'resignation' to 'Resignation' category by splitting and selecting the first element
dete_survey_updated['separationtype'] = dete_survey_updated['separationtype'].str.split('-').str[0]
dete_survey_updated['separationtype'].value_counts()
dete_survey_updated['separationtype'].unique()
tafe_survey_updated['separationtype'].value_counts()
dete_resignations = dete_survey_updated[dete_survey_updated['separationtype'] == 'Resignation'].copy()
tafe_resignations = tafe_survey_updated[tafe_survey_updated['separationtype'] == 'Resignation'].copy()
dete_resignations['cease_date'].value_counts()
dete_resignations['cease_date'] = dete_resignations['cease_date'].str.split('/').str[-1]
dete_resignations['cease_date'] = dete_resignations['cease_date'].astype("float")
dete_resignations['cease_date'].value_counts()
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.boxplot(dete_resignations['cease_date'])
#dete_resignations.boxplot(column=['cease_date'])
plt.show()
What I expected to happen:
Boxplot to be created for dete_resignations[‘cease_date’].
What actually happened:
Key error 0
How would I create separate boxplots for the two datasets?
I’m getting a key error 0. Why is it looking for a key 0?