What to do with numerical but discrete values? And with quality descriptions?

Hi, I am doing the guided project: ‘predicting houses sales prices’, and I have two questions about featurization:

  1. Some columns have numerical but discrete values. Should I treat them just as numerical continuous?
    disc_cols=[‘Full Bath’, ‘Half Bath’, ‘Bedroom AbvGr’, ‘Kitchen AbvGr’, ‘TotRms AbvGrd’, ‘Fireplaces’, ‘Garage Cars’]

  2. Other columns are text, but actually they are a quality scale (bad, good, excellent…). Should I convert them to a numerical scale and then treat them as in question 1, or should I convert them to categorical and make the dummy process?
    scale_cols=[‘Exter Qual’, ‘Exter Cond’, ‘Bsmt Qual’, ‘Bsmt Cond’, ‘Bsmt Exposure’, ‘BsmtFin Type 1’, ‘BsmtFin Type 2’, ‘Heating QC’, ‘Kitchen Qual’]

I think we studied that in some Dataquest module, but I forgot and I cannot find it.


Hi ncirauqu,

Im not 100% sure but I would not treat the discrete variables as categorical as you would be losing information.

The disc_cols you mentioned above are variables on the ratio scale, which can be continuous or discrete as we can tell the size of the difference between the values.

I think the module this is from is:

I hope this helps