Hey
Have the same problem as other guys here - the plot is different than expected. I don’t have their issue though with pålotting in a loop. Where is my error?
import matplotlib.pyplot as plt
mean = houses['SalePrice'].mean()
sample_size = 5
sampling_errors = {}
for i in range(0, 101):
sample = houses['SalePrice'].sample(sample_size, random_state=i)
sampling_error = mean - sample.mean()
sampling_errors[sample_size] = sampling_error
sample_size += 29
plt.scatter(list(sampling_errors.keys()), list(sampling_errors.values()))
plt.axhline(0)
plt.axvline(2930)
plt.xlabel('Sample size')
plt.ylabel('Sampling error')
Thanks in advance
1 Like
Hi @MaksymKarazieiev,
your solution is good, but the DQ’s validation method shows difference in plots.
I tested it, compared to mine and this is the issue:
- You are using a dictionary instead of for example a list
The key difference here is that a dictionary in python doesn’t have an intrinsic order, the keys/values are stored in memory in some way and is not sorted. A List, on the other hand, it does. It seems that the “plot check” in the system is observing this sorted values somehow.
You have then 2 options:
- Use a list for indexes and another for values and then plot
- Use an ordered dictionary:
Include the following library:
from collections import OrderedDict
And before plotting:
sampling_errors = OrderedDict(sorted(sampling_errors.items()))
2 Likes
Thank you @fedepereira for your time and explanation!
1 Like