# Brought along the changes we made to the `dc_listings` Dataframe.
dc_listings = pd.read_csv('dc_airbnb.csv')
stripped_commas = dc_listings['price'].str.replace(',', '')
stripped_dollars = stripped_commas.str.replace('$', '')
dc_listings['price'] = stripped_dollars.astype('float')
dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))]
temp_df = dc_listings.copy()
## Complete the function.
acc_one = predict_price(1)
acc_two = predict_price(2)
acc_four = predict_price(4)
What I expected to happen:
What actually happened:
It says acc_two is not defined. same for the others
Also, why do we use loc when doing
dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))]?
what would happen if we didn’t use loc? it seems that iloc also works. Whats the difference and why does that also work
What was the point of np.random.seed(1)? Was it ever used? could we have used 3 instead of 1?
Replace this line with the output/error
Your code is correct, but it seems like the platform expects us to use the variable name
mean_price to assign the mean price to. You can check the variable inspector and see that it doesn’t list the
acc_four variables. Hence, the error. So, use
mean_price instead of
predicted_price in your code and check whether it works.
loc to return a new Dataframe containing the shuffled order.
loc is label based data selection method which means that we have to pass the name of the row or column which we want to select.
iloc is a indexed based selection method which means that we have to pass integer index in the method to select specific row/column.
We can use any of the two that suits our requirement. Both the methods work here because, the index column has integer data.
seed() method is used to initialize the random number generator.
The random number generator, we use here, needs a number to start with (a seed value), to be able to generate a random number.
DQ platform expects us to pass 1 for answer checking purposes.
The seed(1) method used here helps us to generate an array of shuffled numbers (through
np.random.permutation(len(dc_listings))) that is the same as the one used by DQ. Again, this is to validate the answer.
Hope its clear now.
So in the same page they have print(dc_listings[dc_listings[“distance”] == 0][“accommodates”])
When I try print(dc_listings.loc[dc_listings[“distance”] == 0][“accommodates”]) I also get the same thing. Why is that? Could it be that a series is just a trivial data frame?
However dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))] needs the loc.
For the seed method what would a number besides 1 mean?
dc_listings doesn’t have a column named
distance in it.
will return a permuted range. Each time you run this piece of code you’ll get a different sequence.
seed() will seed the generator. When you define
seed() before using the
permutation() method, the permuted sequence will be the same every time you run the below code
You can pass any number to
seed(). The number you pass will be the initial value used by the pseudorandom number generator.
When you use
1 in the
seed() method, the sequence generated is the same as the one used by DQ.
Your return statement says predict_price instead of predicted_price. Can you check and confirm.
I cannot pinpoint any error with the code otherwise