# Lambda within lambda

My Code:

``````def predict_price(new_listing):
temp_df = train_df.copy()
temp_df['distance'] = temp_df['bathrooms'].apply(lambda x: np.abs(x - new_listing))
temp_df = temp_df.sort_values('distance')
nearest_neighbors_prices = temp_df.iloc[0:5]['price']
predicted_price = nearest_neighbors_prices.mean()
return(predicted_price)

test_df["predicted_price"]=test_df["bathrooms"].apply(lambda x:predict_price(x))
test_df["squared_error"]=np.abs(test_df["predicted_price"]-test_df["price"])**2
mse=test_df["squared_error"].mean()
print(mse)
``````

What I expected to happen:
I understand that weâre supposed to use test_df[âpredicted_priceâ]=test_df[âbathroomsâ].apply(lambda x:predict_price(x)) to predict the price of the listings in the testing set.

But when I go back to the code I see that within the predict_price function we have the following important piece of code: temp_df[âdistanceâ] = temp_df[âbathroomsâ].apply(lambda x: np.abs(x - new_listing))

Here we are using the train section of the data set, which has more rows than the test section. How are those compatible.
Iâd appreciate a small walkthrough of what happens at say, 2 specifics rows of the testing dataset and follow what the code is doing.

What actually happened:

``````Replace this line with the output/error
``````

You use the data in the train dataframe to predict the house price for points in the test dataframe. But how is this done?

With element-wise operation, you take one point at a time from the `test_df['bathrooms']`. The value in one cell in the bathrooms column is the `x` in `apply(lambda x:predict_price(x))`.

This value `x` is supplied to the `predict_price` function as `new_listing`. The value in one cell of the `test_df['bathroom']` enters the function as `new_listing`.

The idea of `nearest neighbors` is using the prices of houses with the similar number of bathrooms to predict the price of a new house with the same bathroom.

Here, we create a new column called `distance` in the train_df. We fill this column by subtracting the number of bathrooms in the train data from the number of bathroom for a cell in the test data. Then we sort the dataframe to get rows with the smallest difference at the top. Smallest difference means they are the closest neighbors.

Here, we take the first five neighbors and we predict the mean of these houses. So that particular cell in the test data will contain this price. This is the price predicted for this particular house from the train data.

Here, we find the difference between the predicted price from the test data and the actual price of the house.

Hope it helps!

1 Like