Filling Categorical Data

Dear Support,

I am working on a dataset that has categorical data in some columns. Two columns have large missing data and I was thinking about filling them since they will impact the analysis significantly more so the missing data are more than 5% of the entire data. I am not sure of the best method to use in filling the data. However, I thought of the following methods:

  1. Use the mode – This will give too much point to a particular category
  2. Divide the total number of missing data of a column by the total number of category and assign the result to equally to the missing data each category.
  3. Randomly fill the missing data with data in the column itself

Below are the frequency of the category and the null values in the category column.

cust_demg['category'].value_counts()

Manufacturing         799
Financial Services    774
Health                602
Retail                358
Property              267
IT                    223
Entertainment         136
Argiculture           113
Telecommunications     72

cust_demg.isnull().sum()
category      656

Which of the options above is best or could there be another option I have not thought of?

1 Like

Hi @ignatiusebigwai,

I believe this article will help you to identify the best approach to deal with this case:

Best,
Sahil

Sahil,

Thanks very much for your reply.
It is indeed helpful

Regards,
Ignatius.

1 Like