# Non unique value in value_counts()

I’m working with the original dataset and trying to push the analysis a bit further by analyzing the dissatisfaction through the ages.

I’m starting by a ‘pre’ cleaning of the ‘age’ column.

``````combined_updated['age'] = (combined_updated['age']
.str.replace('or', '-')
.str.split(' ')
.str.join('')
.str.strip()
)
print(color.BOLD + "Values in the `age` column : " + color.END)
combined_updated['age'].value_counts(dropna = False)
``````

Output :

``````Values in the `age` column :
51-55         71
NaN           55
41-45         48
41–45         45
46-50         42
36-40         41
46–50         39
26-30         35
21–25         33
36–40         32
26–30         32
31–35         32
56-older      29
31-35         29
21-25         29
56-60         26
61-older      23
20-younger    10
Name: age, dtype: int64
``````

I don’t understand why I have some duplicates in the dataset ? For example, the 41-45 slots seems identical, doesn’t it ?

1 Like

I solved it by copy pasting the dashes from the output. I suspected that their might be different kind of dashes, and it worked :

``````combined_updated['age'] = (combined_updated['age']
.str.replace('or', '-')
.str.replace('-', '-')
.str.replace('–', '-')
.str.split(' ')
.str.join('')
.str.strip()
)

print(color.BOLD + "Values in the `age` column : " + color.END)
combined_updated['age'].value_counts(dropna = False)
``````

Output :

``````> Values in the `age` column :
> 41-45         93
> 46-50         81
> 36-40         73
> 51-55         71
> 26-30         67
> 21-25         62
> 31-35         61
> NaN           55
> 56-older      29
> 56-60         26
> 61-older      23
> 20-younger    10
> Name: age, dtype: int64``````
1 Like