Strange Entries in Likert Scales for DETE survey data


I wanted to add some extra analysis to my Clean and Analyze Employee Exit Surveys project. I was going to use the Chi Squared method to analyze the answers to the survey questions and see if there was a correlation between resignations and any individual factors. Most of the survey columns have entries that are “SD, D, N, A, SA” which I reasoned was a Likert Scale of (Strongly) Agree/Disagree and Neutral. But about 10% of the rows have ‘M’ as an entry and I cannot for the life of me figure out why. I thought maybe it could be ‘Missing’, but there are plenty on NaN values in there as well. I looked through the data documentation and I couldn’t find any relevant information. Does anyone have any thoughts here?

Click here to open the screen in a new tab.

Excellent question!

Unfortunately, if the source of the data doesn’t explain this it would be difficult to figure out what that M could be.

As per me, most likely M means Missing. That is, the individual did not select an option. The NaN values, however, correspond to N/A that is Not Applicable/Available. I am basing this on the source of the data. The data doesn’t contain NaN's for those columns. It contains N/A and the two can be different.

This isn’t to say that N/A makes sense for those columns necessarily. There are ALWAYS interpretation issues with data collection and subsequent analysis. Since we don’t have more information on the data collection, I don’t think we can do much.

The project treats N/A's as NaNs, I think, so you can either work with that consideration and include M as part of those NaNs. Or you can download the data separately and work through it based on what makes more sense to you.