All of my averages are coming out to 5000 and I just know that's wrong but I can't figure out why

Screen Link:

My Code:

aggregate_means = {}
brand_freq = {}
prices_by_index = autos_brands.loc[:,'price']

#print(autos_brands.loc[2])
for row in autos_brands['brand']:
    add_row = 0
    if row in aggregate_means:
        aggregate_means[row]+=prices_by_index.iloc[add_row]
        brand_freq[row]+=1
    else:
        aggregate_means[row]=prices_by_index.iloc[add_row]
        brand_freq[row]=1
    add_row+=1
aggregate_sums = aggregate_means
for key in aggregate_sums:
    aggregate_means[key] = aggregate_sums[key]/brand_freq[key]

What I expected to happen:
I checked autos_brands.shape() and I checked that autos_brands['price'] has the same price values as the relative posts in autos. I expected to get average prices.

What actually happened:
I got 5000 for every brand’s average:

print(aggregate_means)
print(brand_freq)
print(prices_by_index)

Results in the following:

{'peugeot': 5000.0, 'bmw': 5000.0, 'volkswagen': 5000.0, 'smart': 5000.0, 'ford': 5000.0, 'seat': 5000.0, 'renault': 5000.0, 'mercedes_benz': 5000.0, 'audi': 5000.0, 'opel': 5000.0, 'mazda': 5000.0, 'toyota': 5000.0, 'nissan': 5000.0, 'fiat': 5000.0, 'skoda': 5000.0, 'citroen': 5000.0}
{'peugeot': 1413, 'bmw': 5201, 'volkswagen': 10157, 'smart': 684, 'ford': 3330, 'seat': 888, 'renault': 2272, 'mercedes_benz': 4586, 'audi': 4118, 'opel': 5155, 'mazda': 730, 'toyota': 609, 'nissan': 734, 'fiat': 1232, 'skoda': 772, 'citroen': 674}
0        5000.000000
1        8500.000000
2        8990.000000
3        4350.000000
4        1350.000000
            ...     
49995   24900.000000
49996    1980.000000
49997   13200.000000
49998   22900.000000
49999    1250.000000
Name: price, Length: 42555, dtype: float64

autos_brands was created by trimming unwanted brands from my dataset, but I’ve matched the prices back to the original autos dataset to make sure I haven’t made any assignment errors along the way.

Hi @BunterTheMage,

Could you please first share the exact code for calculating autos_brands ?

I actually just realized that the major mistake here is setting add_row (which I was using to identify the row’s index) to 0 inside of the loop, so I was only ever getting the same one row.) My mistake! Once I fixed that, everything worked normally again :slight_smile:

For reference though,

boolean_brand = autos['brand'].value_counts(normalize=True) > 0.010
brand_list_clean = brand_list[boolean_brand==True]
relevant_brands = []
for row in car_brands:
    if row in brand_list_clean:
        relevant_brands.append(row)
print(relevant_brands)
autos_brands = pd.DataFrame()
autos_brands = autos.loc[autos['brand'].isin(relevant_brands)]

That’s how I formed the dataframe autos_brands

2 Likes

Ah, yes, exactly, now I also see this issue. That’s great that you found and fixed it! :blush:

1 Like