pandas.Categorical

Screen Link:

My question is:
After transforming categorical values to numeric values, how do we know the corresponding categorical name of its numeric number? For example, how do we know that 4 is private for the code below? Of course, I do not expect to compare it using print() and eyes’ comparison. Is it possible to run a code to compare the parallel numbers and their categorical values? For example:
a – 0
b – 1
c – 2
d – 3

private_incomes = income[income["workclass"] == 4] 
public_incomes = income[income["workclass"] != 4]
print(private_incomes.shape)
print(public_incomes.shape)

Thank you!

On the 3rd Screen, since we replace the data in income[workclass] with col.codes it’s unfortunately not possible to get back the categories from income[workclass] after that.

Ideally, as per me, the course should have asked us to create a separate column for the numerical codes while keeping the names as is.

You can go back to the 3rd Screen and use col.categories to get the categories. The output of that will be -

Index([' ?', ' Federal-gov', ' Local-gov', ' Never-worked', ' Private',
       ' Self-emp-inc', ' Self-emp-not-inc', ' State-gov', ' Without-pay'],
      dtype='object')

The categories are above and their corresponding indices will be their numeric codes. So, ? will be 0, Private will be 4 etc.

1 Like

Hi @the_doctor,

Thank you for your reply. It answers my question perfectly.

print(col.categories) does the magic!