Screen Link:
Initially, I had imported my data like this:
dete_survey = pd.read_csv('dete_survey.csv')
Then this is what my data looked like:
Then, to replace the ‘Not Stated’ values by NaN (null), I next wrote:
# Reimport DETE data, consider 'Not Stated' a missing value
dete_survey = pd.read_csv('dete_survey.csv', na_values = 'Not Stated')
# Check a sample
dete_survey[dete_relevant_columns].head(10)
So in comparison to the original code, the part that I added now is na_values = 'Not Stated'
What I expected to happen:
‘Not Stated’ replaced by NaN.
What actually happened:
Well that happened, but the other values also changed. E.g. 1984 now changed to 1984.0. See screenshot:
And it seems that when running dete_survey_updated.info()
, I now have a float64, where earlier I had an object.
Anyone who can explain this? And any suggestions what I should do? Apart from that this looks very strange (1984.0), I am not sure whether this impacts any analysis that I want to do for this field later on?