Strange Entries in Likert Scales for DETE survey data


I wanted to add some extra analysis to my Clean and Analyze Employee Exit Surveys project. I was going to use the Chi Squared method to analyze the answers to the survey questions and see if there was a correlation between resignations and any individual factors. Most of the survey columns have entries that are “SD, D, N, A, SA” which I reasoned was a Likert Scale of (Strongly) Agree/Disagree and Neutral. But about 10% of the rows have ‘M’ as an entry and I cannot for the life of me figure out why. I thought maybe it could be ‘Missing’, but there are plenty on NaN values in there as well. I looked through the data documentation and I couldn’t find any relevant information. Does anyone have any thoughts here?

Excellent question!

Unfortunately, if the source of the data doesn’t explain this it would be difficult to figure out what that M could be.

As per me, most likely M means Missing. That is, the individual did not select an option. The NaN values, however, correspond to N/A that is Not Applicable/Available. I am basing this on the source of the data. The data doesn’t contain NaN's for those columns. It contains N/A and the two can be different.

This isn’t to say that N/A makes sense for those columns necessarily. There are ALWAYS interpretation issues with data collection and subsequent analysis. Since we don’t have more information on the data collection, I don’t think we can do much.

The project treats N/A's as NaNs, I think, so you can either work with that consideration and include M as part of those NaNs. Or you can download the data separately and work through it based on what makes more sense to you.