Lambda within lambda

Screen Link:

My Code:

def predict_price(new_listing):
    temp_df = train_df.copy()
    temp_df['distance'] = temp_df['bathrooms'].apply(lambda x: np.abs(x - new_listing))
    temp_df = temp_df.sort_values('distance')
    nearest_neighbors_prices = temp_df.iloc[0:5]['price']
    predicted_price = nearest_neighbors_prices.mean()
    return(predicted_price)

test_df["predicted_price"]=test_df["bathrooms"].apply(lambda x:predict_price(x))
test_df["squared_error"]=np.abs(test_df["predicted_price"]-test_df["price"])**2
mse=test_df["squared_error"].mean()
print(mse)

What I expected to happen:
I understand that we’re supposed to use test_df[“predicted_price”]=test_df[“bathrooms”].apply(lambda x:predict_price(x)) to predict the price of the listings in the testing set.

But when I go back to the code I see that within the predict_price function we have the following important piece of code: temp_df[‘distance’] = temp_df[‘bathrooms’].apply(lambda x: np.abs(x - new_listing))

Here we are using the train section of the data set, which has more rows than the test section. How are those compatible.
I’d appreciate a small walkthrough of what happens at say, 2 specifics rows of the testing dataset and follow what the code is doing.

What actually happened:

Replace this line with the output/error

You use the data in the train dataframe to predict the house price for points in the test dataframe. But how is this done?

With element-wise operation, you take one point at a time from the test_df['bathrooms']. The value in one cell in the bathrooms column is the x in apply(lambda x:predict_price(x)).

This value x is supplied to the predict_price function as new_listing. The value in one cell of the test_df['bathroom'] enters the function as new_listing.

The idea of nearest neighbors is using the prices of houses with the similar number of bathrooms to predict the price of a new house with the same bathroom.

Here, we create a new column called distance in the train_df. We fill this column by subtracting the number of bathrooms in the train data from the number of bathroom for a cell in the test data. Then we sort the dataframe to get rows with the smallest difference at the top. Smallest difference means they are the closest neighbors.

Here, we take the first five neighbors and we predict the mean of these houses. So that particular cell in the test data will contain this price. This is the price predicted for this particular house from the train data.

Here, we find the difference between the predicted price from the test data and the actual price of the house.

Hope it helps!

1 Like