Hello!
I wanted to add some extra analysis to my Clean and Analyze Employee Exit Surveys project. I was going to use the Chi Squared method to analyze the answers to the survey questions and see if there was a correlation between resignations and any individual factors. Most of the survey columns have entries that are “SD, D, N, A, SA” which I reasoned was a Likert Scale of (Strongly) Agree/Disagree and Neutral. But about 10% of the rows have ‘M’ as an entry and I cannot for the life of me figure out why. I thought maybe it could be ‘Missing’, but there are plenty on NaN values in there as well. I looked through the data documentation and I couldn’t find any relevant information. Does anyone have any thoughts here?
Click here to open the screen in a new tab.
Excellent question!
Unfortunately, if the source of the data doesn’t explain this it would be difficult to figure out what that M
could be.
As per me, most likely M
means Missing
. That is, the individual did not select an option. The NaN
values, however, correspond to N/A
that is Not Applicable/Available
. I am basing this on the source of the data. The data doesn’t contain NaN
's for those columns. It contains N/A
and the two can be different.
This isn’t to say that N/A
makes sense for those columns necessarily. There are ALWAYS interpretation issues with data collection and subsequent analysis. Since we don’t have more information on the data collection, I don’t think we can do much.
The project treats N/A
's as NaN
s, I think, so you can either work with that consideration and include M
as part of those NaN
s. Or you can download the data separately and work through it based on what makes more sense to you.