Finding most common brand/model combination

Hi,

I am trying to work through the extra tasks on the Ebay Car Sale guided project.

I am trying to find the combinations of the most common brand/models. However, I am having trouble adding the value_counts values to the dictionary I am trying to add it to… I am trying to end up with a dictionary with the models and their value_counts in a dict…

Firstly, I find the most common brands and I arbitrarily pick the top 10.

autos['brand'].value_counts()

The most common brands are:

  • Volkswagen
  • BMW
  • Opel
  • Mercedes
  • Audi
  • Ford
  • Renault
  • Peugeot
  • Fiat
  • Seat

I then look look at each brand’s most popular models and I start with VW, using the below statement:

vw_pop = autos.loc[autos['brand'] == 'volkswagen', 'model'].value_counts(normalize=True)

From here, I pick the VW models with >1% of relative frequency [amongst the VWs] and assign it to a variable.

vw_top_pop = vw_pop[vw_pop >0.01]*100

I then index the top models as follows:

vw_top_models = vw_top_pop.index

I set up an empty dict:

vw_dict = {}

I loop through and update the vw_dict with values & keys, as such:

for i in vw_top_models: sel_rows = autos[autos['model']== i] vw_counts = sel_row['model'].count() vw_dict[i] = vw_counts

However, the dictionary doesn’t have the accurate value_counts of the models and I just don’t know how to resolve this!! SOS! How do I do this??

Many thanks,

BQ

1 Like

If it’s not accurate in a loop, then don’t loop first.
Look for a particular brand and particular model. Find it using your code without loop.
Repeat for same brand but 2nd model. When you prove 2 accurate results, the inner loop most likely will work.
Outer loop can be tested by aggregating over all the models to prevent complex inner loop code. Once outer loop works, substitute inner loop aggregation code with own logic.

This is very hard to read. You can use triple back ticks ``` to wrap (both on their own newlines) a code block to retain their formatting across lines.

Where did sel_row come from when previously it is sel_rows?
Why sel_rows created from autos rather than vw_pop? If there are different brands with the same models, this will cause overcounting on both brands.

Once you fix this, you can try groupby https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html.
autos.groupby['brand'].size gives same result as autos['brand'].value_counts() (assuming no NaN), with advantage that you can groupby on more than 1 column, which is exactly your use-case here. Saves writing 2 loops. Then, you can try filtering the 2-level index you get from doing groupby on 2 levels to get the top 10 brands.
Assuming you need normalize=True, value_counts gives that function more conveniently. In groupby, have to chain a transform like df.groupby('col').transform(lambda x: (x - x.mean()) / x.std())

1 Like