Why did we add the price column back to the normalized_listings?

Screen Link:
https://app.dataquest.io/m/140/multivariate-k-nearest-neighbors/4/normalize-columns

My Code:

normalized_listings = (dc_listings - dc_listings.mean())/dc_listings.std()
normalized_listings['price'] = dc_listings['price']
normalized_listings.head(3)

What I expected to happen:
I though we need to normalise the target column (price) as well

What actually happened:

Replace this line with the output/error

In this case since the data may vary widely (high vs low data variance), it is suggested to normalize it before performing any comparisons in the new few mission steps.

You may have noticed that while the accommodates, bedrooms, bathrooms, beds, and minimum_nights columns hover between 0 and 12 (at least in the first few rows), the values in the maximum_nights and number_of_reviews columns span much larger ranges. For example, the maximum_nights column has values as low as 4 and as high as 1825, in the first few rows itself.

i get that for the other columns. My question is just specific to the target column, which is price. In the code, we run the normalization formula on all the columns. But lateron (second line of the code), we replace the normalized values in price column with the original values. Why is that??
Why don’t we normalize the target column?

@ravirajkakati: Later on in step 7, we will use the price column. Everything else besides price should be normalized. In that case we will assign the original price value back to the column since we want the actual cost of the AirBnB apartment (for accuracy).

In Step 7,

  • Use the fit method to specify the data we want the k-nearest neighbor model to use. Use the following parameters:
    • training data, feature columns: just the accommodates and bathrooms columns, in that order, from train_df.
    • training data, target column: the price column from train_df.
1 Like

Right, gotcha!! Thank you :nerd_face:

1 Like

No worries @ravirajkakati, happy to help!