 # Z-scores function do not understand

https://app.dataquest.io/m/309/z-scores/4/locating-values-in-different-distributions

Above it is asked to —Find out the location for which \$200,000 has the z-score closest to 0

I do not see where the code is computing the z-score that is closest to zero .Could it pleased be pointed out?

Also I do not understand the list comprehension here and why a dictionary was created with {} but it then uses […] for a list and I have not seen () inside a list before [(…)(…)] although I guess these are key value pairs?

``````
def z_score(value, array, bessel = 0):
mean = sum(array) / len(array)

from numpy import std
st_dev = std(array, ddof = bessel)

distance = value - mean
z = distance / st_dev

return z
# Segment the data by location
north_ames = houses[houses['Neighborhood'] == 'NAmes']
clg_creek = houses[houses['Neighborhood'] == 'CollgCr']
old_town = houses[houses['Neighborhood'] == 'OldTown']
edwards = houses[houses['Neighborhood'] == 'Edwards']
somerset = houses[houses['Neighborhood'] == 'Somerst']

# Find the z-score for 200000 for every location
z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
(old_town, 'OldTown'), (edwards, 'Edwards'),
(somerset, 'Somerst')]:

z_by_location[neighborhood] = z_score(200000, data['SalePrice'],
bessel = 0)

# Find the location with the z-score closest to 0
print(z_by_location)
best_investment = 'College Creek'``````

Hello @jamesberentsen,

I used this for the calculation:

``````best_investment = min(z_scores, key= lambda k: abs(z_scores[k]))
best_investment = 'College Creek'
``````

`College Creek` was abbreviated as `'CollgCr'` in the neigborhood column. So I guess they checked the smallest value from `z_by_location` dictionary and assigned the smallest to `best_investment`

1 Like

Hello monorienaghogho,

Thanks I can see how you would get that now from looking at the min() function in your calculation as applied to the dictionary .

{‘NAmes’: 1.7239665910370237, ‘CollgCr’: -0.03334366282705464, ‘OldTown’: 1.7183080926865524, ‘Edwards’: 1.443576193848941, ‘Somerst’: -0.5186390646965722}

I do not understand this part below I see new dataframes are created
and think it is only extracting those rows where the [‘Neighborhood’] == ‘NAmes’]
for all five.
north_ames = houses[houses[‘Neighborhood’] == ‘NAmes’]

However I do not get this part is this a list of lists : [(…,’…’),(…’…’)]?
It creates a dictionary here
z_by_location = {}
then loops and adds the z-score here
z_by_location[neighborhood] = z_score(200000, data[‘SalePrice’],
bessel = 0)
but I cannot connect the two seemingly different data structures the declaration of a dictionary and then reversion to the uses of a list?

Could you please explain this syntax [(…,’…’),(…’…’)]

z_by_location = {}
for data, neighborhood in [(north_ames, ‘NAmes’), (clg_creek, ‘CollgCr’),
(old_town, ‘OldTown’), (edwards, ‘Edwards’),
(somerset, ‘Somerst’)]

Hello @jamesberentsen, this is how this works with what you posted:

The `z_score` function calculate the `z_score` for numerical values supplied to it. In this case we want to calculate the `z_score` for different neighborhoods, so we must create `different dataframes for each neighborhood` and supply this to the `z_score` function.

``````def z_score(value, array, bessel = 0):
mean = sum(array) / len(array)

from numpy import std
st_dev = std(array, ddof = bessel)

distance = value - mean
z = distance / st_dev

return z
``````

Here we create `each neighborhood dataframe`

``````north_ames = houses[houses['Neighborhood'] == 'NAmes']
clg_creek = houses[houses['Neighborhood'] == 'CollgCr']
old_town = houses[houses['Neighborhood'] == 'OldTown']
edwards = houses[houses['Neighborhood'] == 'Edwards']
somerset = houses[houses['Neighborhood'] == 'Somerst']
``````

Here you create an empty dictionary `z_by_location` to store the z_score from the different neighborhoods.

`for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'), ...]` Here we created a list of the `name we saved the dataframe as` and the `name of the neighborhood in the data`. The first iteration picks `(north_ames, 'NAmes')` from the list.

`data, neighborhood` break the `(north_ames, 'NAmes')` into parts. `data takes north_ames` and `neigborhood takes NAmes`. This is the same thing like this: `a, b = (3, 2) `. Here a equals 3 and b equals 2.

` z_by_location[neighborhood] = z_score(200000, data['SalePrice'], bessel = 0)`
Here you calculate the `z_score` with the `data['SalePrice']` and you save the value in the dictionary as the neighborhood name ` z_by_location[neighborhood]`

``````z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
(old_town, 'OldTown'), (edwards, 'Edwards'),
(somerset, 'Somerst')]:

z_by_location[neighborhood] = z_score(200000, data['SalePrice'],
bessel = 0)
``````

See how I solved it:

``````neighborhoods = ['NAmes', 'CollgCr', 'OldTown', 'Edwards', 'Somerst']
z_scores = {}

for neighborhood in neighborhoods:
neg_data = houses.loc[houses['Neighborhood'] == neighborhood, 'SalePrice']
z_s = z_score(200000, neg_data)
z_scores[neighborhood] = z_s

best_investment = min(z_scores, key= lambda k: abs(z_scores[k]))
best_investment = 'College Creek'
``````
5 Likes

Thanks again,
I just wondered.

I am trying to print out the minimum item from the dictionary editing code from this stackoverflow link, but I get an error
``````#d.items()
#[(320, 1), (321, 0), (322, 3)]
# find the minimum by comparing the second element of each tuple
y=min(d.items(), key=lambda x: x)
print(y)
-----------------------------------------end of code

{'NAmes': 1.7239665910370237, 'CollgCr': -0.03334366282705464, 'OldTown': 1.7183080926865524, 'Edwards': 1.443576193848941, 'Somerst': -0.5186390646965722}
<class 'float'>

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-91-d7719486e987> in <module>
37 #[(320, 1), (321, 0), (322, 3)]
38 # find the minimum by comparing the second element of each tuple
---> 39 y=min(d.items(), key=lambda x: x)
40 print(y)

AttributeError: 'float' object has no attribute 'items'``````

Try this

``````item = [(320, 1), (321, 0), (322, 3)]

max(dict(item), key=dict(item).get)

Output: 322
``````
1 Like

Hi,

Sorry I think I was not that clear.
I meant I was trying to get the minimum z score printed out for the dictionary here

``````z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
(old_town, 'OldTown'), (edwards, 'Edwards'),
(somerset, 'Somerst')]:``````
``````z_by_location = {}
for data, neighborhood in [(north_ames, 'NAmes'), (clg_creek, 'CollgCr'),
(old_town, 'OldTown'), (edwards, 'Edwards'),
(somerset, 'Somerst')]:

z_by_location[neighborhood] = z_score(200000, data['SalePrice'],
bessel = 0)

min(z_by_location, key=lambda k: abs(z_by_location[k]))
``````

You may not be able to use `min(z_by_location, key=z_by_location.get)` because no provisions for the use of `abs`. It returns a value quite alright, but not the right one. It chooses the largest negative number as the minimum.

1 Like

Many thanks.

Regards,
James

1 Like