Z-scores function do not understand

https://app.dataquest.io/m/309/z-scores/4/locating-values-in-different-distributions

Above it is asked to —Find out the location for which $200,000 has the z-score closest to 0

I do not see where the code is computing the z-score that is closest to zero .Could it pleased be pointed out?

Also I do not understand the list comprehension here and why a dictionary was created with {} but it then uses […] for a list and I have not seen () inside a list before [(…)(…)] although I guess these are key value pairs?


def z_score(value, array, bessel = 0):
    mean = sum(array) / len(array)
    
    from numpy import std
    st_dev = std(array, ddof = bessel)
    
    distance = value - mean
    z = distance / st_dev
    
    return z
# Segment the data by location
north_ames = houses[houses['Neighborhood'] == 'NAmes']
clg_creek = houses[houses['Neighborhood'] == 'CollgCr']
old_town = houses[houses['Neighborhood'] == 'OldTown']
edwards = houses[houses['Neighborhood'] == 'Edwards']
somerset = houses[houses['Neighborhood'] == 'Somerst']

# Find the z-score for 200000 for every location
z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
                     (old_town, 'OldTown'), (edwards, 'Edwards'),
                     (somerset, 'Somerst')]:
    
    z_by_location[neighborhood] = z_score(200000, data['SalePrice'],
                                          bessel = 0)

# Find the location with the z-score closest to 0
print(z_by_location)
best_investment = 'College Creek'

Hello @jamesberentsen,

I used this for the calculation:

best_investment = min(z_scores, key= lambda k: abs(z_scores[k]))
best_investment = 'College Creek'

College Creek was abbreviated as 'CollgCr' in the neigborhood column. So I guess they checked the smallest value from z_by_location dictionary and assigned the smallest to best_investment

1 Like

Hello monorienaghogho,

Thanks I can see how you would get that now from looking at the min() function in your calculation as applied to the dictionary .

{‘NAmes’: 1.7239665910370237, ‘CollgCr’: -0.03334366282705464, ‘OldTown’: 1.7183080926865524, ‘Edwards’: 1.443576193848941, ‘Somerst’: -0.5186390646965722}

One more thing please.
I do not understand this part below I see new dataframes are created
and think it is only extracting those rows where the [‘Neighborhood’] == ‘NAmes’]
for all five.
north_ames = houses[houses[‘Neighborhood’] == ‘NAmes’]

However I do not get this part is this a list of lists : [(…,’…’),(…’…’)]?
It creates a dictionary here
z_by_location = {}
then loops and adds the z-score here
z_by_location[neighborhood] = z_score(200000, data[‘SalePrice’],
bessel = 0)
but I cannot connect the two seemingly different data structures the declaration of a dictionary and then reversion to the uses of a list?

Could you please explain this syntax [(…,’…’),(…’…’)]

z_by_location = {}
for data, neighborhood in [(north_ames, ‘NAmes’), (clg_creek, ‘CollgCr’),
(old_town, ‘OldTown’), (edwards, ‘Edwards’),
(somerset, ‘Somerst’)]

Hello @jamesberentsen, this is how this works with what you posted:

The z_score function calculate the z_score for numerical values supplied to it. In this case we want to calculate the z_score for different neighborhoods, so we must create different dataframes for each neighborhood and supply this to the z_score function.

def z_score(value, array, bessel = 0):
    mean = sum(array) / len(array)
    
    from numpy import std
    st_dev = std(array, ddof = bessel)
    
    distance = value - mean
    z = distance / st_dev
    
    return z

Here we create each neighborhood dataframe

north_ames = houses[houses['Neighborhood'] == 'NAmes']
clg_creek = houses[houses['Neighborhood'] == 'CollgCr']
old_town = houses[houses['Neighborhood'] == 'OldTown']
edwards = houses[houses['Neighborhood'] == 'Edwards']
somerset = houses[houses['Neighborhood'] == 'Somerst']

Here you create an empty dictionary z_by_location to store the z_score from the different neighborhoods.

for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'), ...] Here we created a list of the name we saved the dataframe as and the name of the neighborhood in the data. The first iteration picks (north_ames, 'NAmes') from the list.

data, neighborhood break the (north_ames, 'NAmes') into parts. data takes north_ames and neigborhood takes NAmes. This is the same thing like this: a, b = (3, 2) . Here a equals 3 and b equals 2.

z_by_location[neighborhood] = z_score(200000, data['SalePrice'], bessel = 0)
Here you calculate the z_score with the data['SalePrice'] and you save the value in the dictionary as the neighborhood name z_by_location[neighborhood]

z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
                     (old_town, 'OldTown'), (edwards, 'Edwards'),
                     (somerset, 'Somerst')]:
    
    z_by_location[neighborhood] = z_score(200000, data['SalePrice'],
                                          bessel = 0)

See how I solved it:

neighborhoods = ['NAmes', 'CollgCr', 'OldTown', 'Edwards', 'Somerst']
z_scores = {}

for neighborhood in neighborhoods:
    neg_data = houses.loc[houses['Neighborhood'] == neighborhood, 'SalePrice']
    z_s = z_score(200000, neg_data)
    z_scores[neighborhood] = z_s
    
best_investment = min(z_scores, key= lambda k: abs(z_scores[k]))
best_investment = 'College Creek'
5 Likes

Thanks again,
I just wondered.


I am trying to print out the minimum item from the dictionary editing code from this stackoverflow link, but I get an error
#d.items()
#[(320, 1), (321, 0), (322, 3)]
# find the minimum by comparing the second element of each tuple
y=min(d.items(), key=lambda x: x[1]) 
print(y)
-----------------------------------------end of code


{'NAmes': 1.7239665910370237, 'CollgCr': -0.03334366282705464, 'OldTown': 1.7183080926865524, 'Edwards': 1.443576193848941, 'Somerst': -0.5186390646965722}
<class 'float'>

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-91-d7719486e987> in <module>
     37 #[(320, 1), (321, 0), (322, 3)]
     38 # find the minimum by comparing the second element of each tuple
---> 39 y=min(d.items(), key=lambda x: x[1])
     40 print(y)

AttributeError: 'float' object has no attribute 'items'

Try this

item = [(320, 1), (321, 0), (322, 3)]

max(dict(item), key=dict(item).get)

Output: 322
1 Like

Hi,

Sorry I think I was not that clear.
I meant I was trying to get the minimum z score printed out for the dictionary here

z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
                     (old_town, 'OldTown'), (edwards, 'Edwards'),
                     (somerset, 'Somerst')]:
z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
                     (old_town, 'OldTown'), (edwards, 'Edwards'),
                     (somerset, 'Somerst')]:
    
    z_by_location[neighborhood] = z_score(200000, data['SalePrice'],
                                          bessel = 0)

min(z_by_location, key=lambda k: abs(z_by_location[k])) 

You may not be able to use min(z_by_location, key=z_by_location.get) because no provisions for the use of abs. It returns a value quite alright, but not the right one. It chooses the largest negative number as the minimum.

1 Like

Hi monorienaghogho,

Many thanks.

Regards,
James

1 Like