Guided Project: Clean And Analyze Employee Exit Surveys, part 3, column names not changing

Basics (3).ipynb (53.3 KB)


Click here to view the jupyter notebook file in a new tab

that is my current jupyter notebook

the problematic portion right at the end, some of the changes go through, but some don’t.

dete_survey_updated.columns=dete_survey_updated.columns.str.strip()
dete_survey_updated.replace(' ','_',regex=True)
dete_survey_updated.columns = map(str.lower, dete_survey_updated.columns)
tafe_survey_updated.columns=tafe_survey_updated.columns.str.replace('Contributing Factors.','')
tafe_map = {'Record ID':'id', 
            'CESSATION YEAR':'cease_date', 
            'Reason for ceasing employment':'separationtype',
            'Gender. What is your Gender?':'gender',
            'CurrentAge. Current Age': 'age',
            'Employment Type. Employment Type':'employment_status',
            'Classification. Classification': 'position',
            'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
            'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service',
            'Career Move - Public Sector':'career move to public sector', 
            'Career Move - Private Sector':'career move to private sector',
            'Career Move - Self-employment':'career move to self-employed'}
tafe_survey_updated=tafe_survey_updated.rename(columns = tafe_map)
tafe_survey_updated.columns=tafe_survey_updated.columns.str.strip().str.lower()
print(dete_survey_updated.columns.values,'\n')
print(tafe_survey_updated.columns.values)
# trying to remove spaces from DETE, changing to career move <sector>

outputs

['id' 'separationtype' 'cease date' 'dete start date' 'role start date'
 'position' 'classification' 'region' 'business unit' 'employment status'
 'career move to public sector' 'career move to private sector'
 'interpersonal conflicts' 'job dissatisfaction'
 'dissatisfaction with the department' 'physical work environment'
 'lack of recognition' 'lack of job security' 'work location'
 'employment conditions' 'maternity/family' 'relocation' 'study/travel'
 'ill health' 'traumatic incident' 'work life balance' 'workload'
 'none of the above' 'gender' 'age' 'aboriginal' 'torres strait'
 'south sea' 'disability' 'nesb'] 

['id' 'institute' 'workarea' 'cease_date' 'separationtype'
 'career move - public sector' 'career move - private sector'
 'career move - self-employment' 'ill health' 'maternity/family'
 'dissatisfaction' 'job dissatisfaction' 'interpersonal conflict' 'study'
 'travel' 'other' 'none' 'gender' 'age' 'employment_status' 'position'
 'institute_service' 'role_service']

‘contributing factors’ is correctly removed, the extra spaces at the starts and ends of things are stripped correctly, and everything is made lower case. but spaces aren’t being replaced with underscores.

Click here to view the jupyter notebook file in a new tab

Changing this line seems to work

dete_survey_updated.replace(' ','_',regex=True)

to

dete_survey_updated.columns = dete_survey_updated.columns.str.replace(' ', '_')

Output:

['id' 'separationtype' 'cease_date' 'dete_start_date' 'role_start_date'
 'position' 'classification' 'region' 'business_unit' 'employment_status'
 'career_move_to_public_sector' 'career_move_to_private_sector'
 'interpersonal_conflicts' 'job_dissatisfaction'
 'dissatisfaction_with_the_department' 'physical_work_environment'
 'lack_of_recognition' 'lack_of_job_security' 'work_location'
 'employment_conditions' 'maternity/family' 'relocation' 'study/travel'
 'ill_health' 'traumatic_incident' 'work_life_balance' 'workload'
 'none_of_the_above' 'gender' 'age' 'aboriginal' 'torres_strait'
 'south_sea' 'disability' 'nesb']
1 Like

so that fixes everything except for

'career_move_-_public_sector' 'career_move_-_private_sector'
 'career_move_-_self-employment' 

for some reason the rename there

            'Career Move - Public Sector':'career move to public sector', 
            'Career Move - Private Sector':'career move to private sector',
            'Career Move - Self-employment':'career move to self-employed'

just is not working right

Ah I was focused on the dete set that I ignore the tafe set.

When you run this line:

tafe_survey_updated.columns=tafe_survey_updated.columns.str.replace('Contributing Factors.','')

It’s going to change those three columns to this:

' Career Move - Public Sector '
' Career Move - Private Sector '
' Career Move - Self-employment'

There’s still the extra spaces because the str.strip() hasn’t been run yet. In the map, you don’t have the extra spaces, so it’s not an exact match and leaves these alone. Then it processes the str.strip().str.lower() part in the next line, leaving those three as you’re seeing in the result.

1 Like