Guided Project: Predicting Car Prices_normalizing_data

I was doing the guided project for Predicting Car Prices (Introduction to Machine Learning)

while normalizing the date i see in the solution below method is used

numeric_cars = (numeric_cars - numeric_cars.min())/(numeric_cars.max() - numeric_cars.min())

while during practice we have used

numeric_cars = (numeric_cars -numeric_cars.mean())/numeric_cars.std()

so why it is different and what is the approach for normalizing data.

1 Like

The approach for normalizing data depends on the distribution of the data and what are you trying to achieve.

\begin{align} \mathit {Standard\,z-score}:&\quad\quad z= \frac{X - \bar{x}}{σ} \\ \mathit {Min-Max\,Feature\,Scaling}: &\quad\quad X^{'} = \frac{X - X_{min}}{X_{max} - X_{min}} & \end{align}

Standard z score is used to normalize the data when population parameters are known. The normalizing of data works well with normal distribution. This method preserve the range (minimum and maximum) and introduce the dispersion of data (standard deviation or variance).

If your data does not follows a normal distribution, e.g. Gaussian distribution, then it will be better to normalize to a specific range [0,1]. Also, conversation to a specific range[0, 1] will make probability comparison between series data easier.

Feature scaling is used to normalize the data when you want to squeeze the data into range [0, 1]. Also it can be between a different range [a, b].

\begin{align} \mathit {Min-Max\,Feature\,Scaling}: &\quad\quad X^{'} = \frac{(X - X_{min})(b - a)}{X_{max} - X_{min}} & \end{align}

3 Likes