Can someone please explain why I am getting different values for MSE and RMSE?
I am following along with the lesson in Jupyter Notebook on my computer so I can freely experiment.
The DQ answer is:
I am getting in Jupyter on my computer:
I am having trouble figuring out why. I’m thinking it might be in the data cleaning steps? I kind of carried over the data cleaning steps from the previous mission so maybe something is off?
I am even using the same code as the answer. I’ve attached the Jupyter Notebook below.
Thank you for your time
DQ SKLearn Different Answers.ipynb (6.8 KB)
Click here to view the jupyter notebook file in a new tab
ok, I think I just answered my own question:
I forgot to put the
np.random.seed(1) in so when I randomized the dataset using
dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))], it was creating a different randomized version, thus the 5 nearest neighbors were different. Is this correct?
I have new questions now:
How does running
np.random.seed(1) feed into
dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))] ?
np.random.permutation() knows somehow what was passed into
So essentially we always want to randomize our dataset before passing it into Scikit-Learn?
How does this work in practice in terms of reproducibility? Do you always want to set the
np.random.seed() every time?