normalized_listings = (dc_listings - dc_listings.mean())/dc_listings.std()
normalized_listings['price'] = dc_listings['price']
What I expected to happen:
I though we need to normalise the target column (price) as well
What actually happened:
Replace this line with the output/error
In this case since the data may vary widely (high vs low data variance), it is suggested to normalize it before performing any comparisons in the new few mission steps.
You may have noticed that while the
minimum_nights columns hover between 0 and 12 (at least in the first few rows), the values in the
number_of_reviews columns span much larger ranges. For example, the
maximum_nights column has values as low as 4 and as high as 1825, in the first few rows itself.
i get that for the other columns. My question is just specific to the target column, which is price. In the code, we run the normalization formula on all the columns. But lateron (second line of the code), we replace the normalized values in price column with the original values. Why is that??
Why don’t we normalize the target column?
@ravirajkakati: Later on in step 7, we will use the price column. Everything else besides price should be normalized. In that case we will assign the original price value back to the column since we want the actual cost of the AirBnB apartment (for accuracy).
In Step 7,
- Use the
fit method to specify the data we want the k-nearest neighbor model to use. Use the following parameters:
- training data, feature columns: just the
bathrooms columns, in that order, from
- training data, target column: the
price column from
Right, gotcha!! Thank you
No worries @ravirajkakati, happy to help!