Types of variables in Predicting Cars Prices


I’d like to know if I’ve classified the variables into nominal, ordinal, continuous and discrete correctly (and then into numerical and categorical) in this guided project.

Screen Link: https://app.dataquest.io/m/155/guided-project%3A-predicting-car-prices/1/introduction-to-the-data-set

You can read more about the data set here.

import pandas as pd
url_dataset = 'https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data' 

columns_name = ['symboling','normalized_losses','make','fuel_type',

cars = pd.read_csv(url_dataset, names=columns_name)

Do you agree with my classification?

nominal = ['make', 'fuel_type', 'aspiration', 'body_style', 'drive_wheels',
 'engine_location', 'engine_type', 'fuel_system']

ordinal = ['symboling']

continuous = ['normalized_losses','wheel_base', 'length', 'width',
 'height', 'curb_weight', 'engine_size', 'compression_ratio', 
'bore', 'stroke', 'horsepower', 'peak_rpm', 'price']

discrete = ['num_doors', 'num_cylinders']

I also classified them into numerical and categorical variables:

categorical = ['make', 'fuel_type', 'aspiration', 'body_style', 'drive_wheels',
 'engine_location', 'engine_type', 'fuel_system','symboling']

numerical = ['normalized_losses','wheel_base', 'length', 'width', 'height', 'curb_weight',
'engine_size', 'compression_ratio', 'bore', 'stroke', 'horsepower',
 'peak_rpm', 'price', 'num_doors', 'num_cylinders']

I think it is an important first step to know the types of variables we are going to use in a machine learning project.

What do you think? are they well classified?

I have doubts about the symboling variable (risk factor).

Thank you in advance.


Hi @arredocana,

Your classification is ok, including symboling being ordinal. Only that fuel-type, aspiration, num-of-doors, and engine-location can be more precisely classified as dichotomous (categorical nominal variables that have only 2 categories).