I am trying to work through the extra tasks on the Ebay Car Sale guided project.
I am trying to find the combinations of the most common brand/models. However, I am having trouble adding the value_counts values to the dictionary I am trying to add it to… I am trying to end up with a dictionary with the models and their value_counts in a dict…
Firstly, I find the most common brands and I arbitrarily pick the top 10.
autos['brand'].value_counts()
The most common brands are:
Volkswagen
BMW
Opel
Mercedes
Audi
Ford
Renault
Peugeot
Fiat
Seat
I then look look at each brand’s most popular models and I start with VW, using the below statement:
If it’s not accurate in a loop, then don’t loop first.
Look for a particular brand and particular model. Find it using your code without loop.
Repeat for same brand but 2nd model. When you prove 2 accurate results, the inner loop most likely will work.
Outer loop can be tested by aggregating over all the models to prevent complex inner loop code. Once outer loop works, substitute inner loop aggregation code with own logic.
This is very hard to read. You can use triple back ticks ``` to wrap (both on their own newlines) a code block to retain their formatting across lines.
Where did sel_row come from when previously it is sel_rows?
Why sel_rows created from autos rather than vw_pop? If there are different brands with the same models, this will cause overcounting on both brands.
Once you fix this, you can try groupby https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html. autos.groupby['brand'].size gives same result as autos['brand'].value_counts() (assuming no NaN), with advantage that you can groupby on more than 1 column, which is exactly your use-case here. Saves writing 2 loops. Then, you can try filtering the 2-level index you get from doing groupby on 2 levels to get the top 10 brands.
Assuming you need normalize=True, value_counts gives that function more conveniently. In groupby, have to chain a transform like df.groupby('col').transform(lambda x: (x - x.mean()) / x.std())