Code mistake and question on randomizing

Screen Link:


My Code:
# Brought along the changes we made to the `dc_listings` Dataframe.
dc_listings = pd.read_csv('dc_airbnb.csv')
stripped_commas = dc_listings['price'].str.replace(',', '')
stripped_dollars = stripped_commas.str.replace('$', '')
dc_listings['price'] = stripped_dollars.astype('float')
dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))]

def predict_price(new_listing):
    temp_df = dc_listings.copy()
    ## Complete the function.
    temp_df["distance"]=temp_df["accommodates"].apply(lambda x:np.abs(x-new_listing))
    temp_df=temp_df.sort_values("distance")
    nearest_neighbors=temp_df["price"].head()
    predicted_price=nearest_neighbors.mean()
    return(predict_price)

acc_one = predict_price(1)
acc_two = predict_price(2)
acc_four = predict_price(4)

print(acc_one)
print(acc_two)
print(acc_four)

What I expected to happen:

What actually happened:
It says acc_two is not defined. same for the others

Also, why do we use loc when doing
dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))]?
what would happen if we didn’t use loc? it seems that iloc also works. Whats the difference and why does that also work

What was the point of np.random.seed(1)? Was it ever used? could we have used 3 instead of 1?

Replace this line with the output/error

Hi malickke2

Your code is correct, but it seems like the platform expects us to use the variable name mean_price to assign the mean price to. You can check the variable inspector and see that it doesn’t list the acc_one, acc_two, acc_four variables. Hence, the error. So, use mean_price instead of predicted_price in your code and check whether it works.

We use loc[] to return a new Dataframe containing the shuffled order.
loc[] is label based data selection method which means that we have to pass the name of the row or column which we want to select.
iloc[] is a indexed based selection method which means that we have to pass integer index in the method to select specific row/column.

We can use any of the two that suits our requirement. Both the methods work here because, the index column has integer data.

The seed() method is used to initialize the random number generator.
The random number generator, we use here, needs a number to start with (a seed value), to be able to generate a random number.
DQ platform expects us to pass 1 for answer checking purposes.
The seed(1) method used here helps us to generate an array of shuffled numbers (through np.random.permutation(len(dc_listings))) that is the same as the one used by DQ. Again, this is to validate the answer.

Hope its clear now.
Thanks.

Ok .
So in the same page they have print(dc_listings[dc_listings[“distance”] == 0][“accommodates”])
When I try print(dc_listings.loc[dc_listings[“distance”] == 0][“accommodates”]) I also get the same thing. Why is that? Could it be that a series is just a trivial data frame?
However dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))] needs the loc.

For the seed method what would a number besides 1 mean?

The dc_listings doesn’t have a column named distance in it.

will return a permuted range. Each time you run this piece of code you’ll get a different sequence.
seed() will seed the generator. When you define seed() before using the permutation() method, the permuted sequence will be the same every time you run the below code

np.random.seed(3)
np.random.permutation(len(dc_listings))

You can pass any number to seed(). The number you pass will be the initial value used by the pseudorandom number generator.
When you use 1 in the seed() method, the sequence generated is the same as the one used by DQ.

Your return statement says predict_price instead of predicted_price. Can you check and confirm.
I cannot pinpoint any error with the code otherwise