My plot is different to DQs

Hey

Have the same problem as other guys here - the plot is different than expected. I don’t have their issue though with pålotting in a loop. Where is my error?

import matplotlib.pyplot as plt 
mean = houses['SalePrice'].mean()

sample_size = 5
sampling_errors = {}

for i in range(0, 101):
    sample = houses['SalePrice'].sample(sample_size, random_state=i)
    sampling_error = mean - sample.mean()
    sampling_errors[sample_size] = sampling_error
    sample_size += 29
    

plt.scatter(list(sampling_errors.keys()), list(sampling_errors.values()))    
plt.axhline(0)

plt.axvline(2930)

plt.xlabel('Sample size')
plt.ylabel('Sampling error')

Thanks in advance

1 Like

Hi @MaksymKarazieiev,
your solution is good, but the DQ’s validation method shows difference in plots.
I tested it, compared to mine and this is the issue:

  • You are using a dictionary instead of for example a list

The key difference here is that a dictionary in python doesn’t have an intrinsic order, the keys/values are stored in memory in some way and is not sorted. A List, on the other hand, it does. It seems that the “plot check” in the system is observing this sorted values somehow.

You have then 2 options:

  1. Use a list for indexes and another for values and then plot
  2. Use an ordered dictionary:

Include the following library:

from collections import OrderedDict

And before plotting:

sampling_errors = OrderedDict(sorted(sampling_errors.items())) 
2 Likes

Thank you @fedepereira for your time and explanation!

1 Like