# Using applying a function that uses the training data on the test dataset

In the Machine Learning Introduction → Machine Learning Fundamentals → Evaluating Model performance, the function that is defined codes the training dataset as the temporary dataframe (‘temp_df’):

``````def predict_price(new_listing):
**temp_df = train_df.copy()**
temp_df['distance'] = temp_df['accommodates'].apply(lambda x: np.abs(x - new_listing))
temp_df = temp_df.sort_values('distance')
nearest_neighbors_prices = temp_df.iloc[0:5]['price']
predicted_price = nearest_neighbors_prices.mean()
return(predicted_price)
``````

This function is then applied to the test dataframe. I don’t get this at all. If I apply this function to test_df the results will be from the train_df. Can someone please explain this? Thanks!!!

You are trying to predict the price for each row in the `test_df` given the `accommodates` feature. The predicted price is the average of the prices of the nearest neighbours. Those nearest neighbours are found in relation to data we already have - the training set, `train_df`.

The nearest neighbours to the price corresponding to the `accommodates` feature are calculated in the function as -

``````temp_df['distance'] = temp_df['accommodates'].apply(lambda x: np.abs(x - new_listing))
``````

`new_listing` corresponds to a value from `test_df` for the `accommodates` column. Above, an absolute difference is calculated using that value.

We then sort the column, select the first 5 - which will be our nearest neighbours, and then find the average price for those neighbours.

That averaged price is our predicted price for one row of the test set. Since we are using the `apply()` method on the test set, the above repeats for every row in the test set corresponding to the `accommodates` column.

1 Like