As per the instructions, we are dropping columns -
28:49 for dete_survey, and
17:66 for tafe_survey
But it’s not entirely clear why we are dropping these specifically.
Is it because there is no dictionary provided with the dataset which indicates what the categories of the responses for some (or all) of those columns correspond to?
I can see if there are too many null values then those columns can be dropped. But there do seem some columns, which could potentially by useful for the analysis.
Just wanted to know if there’s any “official” response on this or not.
Also, @Sahil - The tag for this particular part of the project is “348-2”. I couldn’t create and add it. If you could do that, would appreciate it so that questions can be added easily by other/future students for this project.
The instructions tell you to drop those columns because they are not related to the questions we want to answer in this project. these questions are:
- Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
- Are younger employees resigning due to some kind of dissatisfaction? What about older employees?
So, by dropping the columns that refer to questions unrelated to the goal of the project you have a cleaner, easier to work dataframe. In my project I even dropped more columns that I thought would not be useful for the analysis.
I do not know if it is an “official” responde, but Dataquest provides the solution for guided projects. You can the solution for this one here.
Thanks for the response.
There were some columns which could be considered as relevant to the questions we want to answer. Hence my post here. The content only provided a simple statement that they are not relevant to the analysis, which might not be true.
I just wanted an “official” response because it would help understand the context behind removing those columns.
For example, some of the Workplace Topic related columns could possibly have been of value.
Of course, I am free to use those columns and conduct my own analysis as well. But would still be better to learn from someone else’s thought process behind the decision for discarding some columns.
But thanks, nevertheless.
What can be considered as relevant goes from person to person. As I said, I used different columns than the ones dataquest suggested. And I think it is important to do so sometimes. Just like your question, it shows that you are not only following instructions and doing what you are told. You thinking about the project and looking for new solutions an approaches.