Ebay Car Sales Guided Project P7 (Alternate Answer)

I got stuck in this page unable to comprehend how does pandas and aggregation interact and have come across a much simpler method to answer the question, is there a rationale that Dataquest makes us answer the question that way? (I still do not understand how the answers provided works, the detailed workings of the loop)

autos_brand = autos[autos.brand.isin(["volkswagen", "bmw", "mercedes_benz", "audi", "opel", "ford"])]
autos_brand.groupby("brand").price.mean()

Hi @genesix32,

Are you saying that you find it difficult to understand this code?

brand_mean_prices = {}

for brand in common_brands:
    brand_only = autos[autos["brand"] == brand]
    mean_price = brand_only["price"].mean()
    brand_mean_prices[brand] = int(mean_price)

brand_mean_prices

The reason we are not using groupby here is that we haven’t introduced this topic yet. Students without any previous Pandas learning experience would only get to know about it when they reach this mission screen:

https://app.dataquest.io/m/343/data-aggregation/4/the-groupby-operation

Best,
Sahil

Yes I would appreciate it if you could explain how each line works, thanks!

1 Like

Hi @genesix32,

for brand in common_brands:

The common_brands variable contains a list (actually an index object) of the following values:

['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford']

On each iteration, the iterator (for loop variable), that is brand will store one of the above values.

Suppose, we are in the first iteration, then this line:

brand_only = autos[autos["brand"] == brand]

will become:

brand_only = autos[autos["brand"] == 'volkswagen']

In the above line, we are filtering the rows based on whether the brand is equal to volkswagen or not. And we assign the resulting set of volkswagen rows to brand_only variable. The next step is to find the mean price of all those volkswagen cars.

mean_price = brand_only["price"].mean()

Since the brand_only variable only contains volkswagen cars, we can just call, the .mean() method on the price column.

brand_mean_prices[brand] = int(mean_price)

in the first iteration:

brand_mean_prices['volkswagen'] = int(5402.410261610221)

And the last step in each iteration is to create a key in the brand_mean_prices dictionary using the brand variable and assign the value in the mean_price variable (after converting it to an integer) to the dictionary key.

Thus, after the first iteration ends, brand_mean_prices dictionary, which was initially an empty dictionary, will become this:

{'volkswagen': 5402}

and once the for loop finishs, it will become something like this:

{'volkswagen': 5402,
 'bmw': 8332,
 'opel': 2975,
 'mercedes_benz': 8628,
 'audi': 9336,
 'ford': 3749}

I hope this has helped you to understand the code. If you have any questions on it, feel free to let me know. I would be happy to help you.

Best,
Sahil

1 Like

Hi Sahil, Would be great if you can kindly check if I understood you explanation correctly. Appreciate you looking though my messy question!

So for “brand_only = autos[autos[“brand”] == brand]” the iteration will store more than one rows of data containing each particular brand? (So there will be multiple rows of cars for the brands, volkswagen, bmw, opel, mercedes_benz, audi, ford respectively)

Because what I am confused about is that for each iteration of the brand_only, does it add the new row into the list or from my assumption that the new iteration replaces the previous one (Does reassigning a object to a list adds to the list or replace the existing object with same brand name?)

1 Like

Hi @genesix32,

On each iteration, only one brand is selected, and all rows related to the brand will be assigned to the brand_only variable. And in the next iteration of the loop, another brand is selected and all the rows related to the current brand will replace the existing rows in brand_only variable. Thus, each iteration of the loop only processes rows of a single brand.

Best,
Sahil