Link to the mission
I’m working with the original dataset and trying to push the analysis a bit further by analyzing the dissatisfaction through the ages.
I’m starting by a ‘pre’ cleaning of the ‘age’ column.
combined_updated['age'] = (combined_updated['age']
.str.replace('or', '-')
.str.split(' ')
.str.join('')
.str.strip()
)
print(color.BOLD + "Values in the `age` column : " + color.END)
combined_updated['age'].value_counts(dropna = False)
Output :
Values in the `age` column :
51-55 71
NaN 55
41-45 48
41–45 45
46-50 42
36-40 41
46–50 39
26-30 35
21–25 33
36–40 32
26–30 32
31–35 32
56-older 29
31-35 29
21-25 29
56-60 26
61-older 23
20-younger 10
Name: age, dtype: int64
I don’t understand why I have some duplicates in the dataset ? For example, the 41-45 slots seems identical, doesn’t it ?