Guided Project: Clean And Analyze Employee Exit Surveys [Handling nan values]

Screen Link: https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/6/create-a-new-column

My Code:

dete_resignations['cease_date'].value_counts(dropna=False).sort_index()

Output :

 2006.0      1
 2010.0      2
 2012.0    129
 2013.0    146
 2014.0     22
NaN         11
Name: cease_date, dtype: int64 
dete_resignations['dete_start_date'].value_counts(dropna=False).sort_index()

Output :

1963.0     1
 1971.0     1
 1972.0     1
 1973.0     1
 1974.0     2
 1975.0     1
 1976.0     2
 1977.0     1
 1980.0     5
 1982.0     1
 1983.0     2
 1984.0     1
 1985.0     3
 1986.0     3
 1987.0     1
 1988.0     4
 1989.0     4
 1990.0     5
 1991.0     4
 1992.0     6
 1993.0     5
 1994.0     6
 1995.0     4
 1996.0     6
 1997.0     5
 1998.0     6
 1999.0     8
 2000.0     9
 2001.0     3
 2002.0     6
 2003.0     6
 2004.0    14
 2005.0    15
 2006.0    13
 2007.0    21
 2008.0    22
 2009.0    13
 2010.0    17
 2011.0    24
 2012.0    21
 2013.0    10
NaN        28
Name: dete_start_date, dtype: int64
tafe_resignations['cease_date'].value_counts(dropna=False).sort_index()

Output:

2009.0      2
 2010.0     68
 2011.0    116
 2012.0     94
 2013.0     55
NaN          5

What should be done to the NaN values here?
Should we straight away drop those rows? Is there a better way?

Thanks!

Hi @veeral27596

You could try replacing the NaN with the median of the column of the dataframe, which can be calculated with some code as mentioned in this article, so that it will not skew the data. In cases where NaN is the year, you may want to remove that particular row.

Hope this helps!

1 Like