@vroomvroom,
I think I just understood what you are trying to do. To format your code properly, copy your codes into a pair of this ``` and also try to specify the mission you are referring to.
I am going to make a brief explanation regarding this line above and regardingthis mission, hoping that it will answer all your questions:
Instruction 1:
- Explore the unique values in the brand column, and decide on which brands you want to aggregate by.
- You might want to select the top 20, or you might want to select those that have over a certain percentage of the total values (e.g. > 5%).
Solution 1:
The value_counts()
function is used to get a series containing counts of unique values, the resulting series has the unique values as its index and the counts of each unique value as its values. Note that when we include the normalize = True
parameter, it uses percentages instead of counts.
while
The unique()
function is just used to find the unique elements of an array.
Now, Let’s aggreagte by brands that have a at least 5% of the total values.
Code 1: (read the comments)
brands = autos['brand'].value_counts(normalize = True)
#assigns a series to the variable `brands` such that its indexes are the unique values in the `brand` column of the `autos` dataframe and its values are the percentage distribution of each unique value
significant_brands = brands[brands > 0.05].index #assigns the index labels whose values are greater than 0.05 in our series above, to the variable `significant_brands`
print(significant_brands) #prints the brands that have at least 0.05(5%) of the total values
Output: Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')
Instruction 2:
- Create an empty dictionary to hold your aggregate data.
- Loop over your selected brands, and assign the mean price to the dictionary, with the brand name as the key.
- Print your dictionary of aggregate data, and write a paragraph analyzing the results.
Solution 2:
Now, To use loops to aggregate data, the process involved are:
- Identify the unique values we want to aggregate by
- Create an empty dictionary to store our aggregate data
- Loop over the unique values, and for each:
- Subset the dataframe by the unique values
- Calculate the mean of whichever column we’re interested in
- Assign the val/mean to the dict as k/v.
In Code 1, We already identified and found the unique values we want to aggregate by. Next is to aggregate
Code 2:
brand_mean_prices = {} #create an empty dictionary
for brand in significant_brands: #loop over `significant_brands` which holds the brands we want to aggregate by
selected_brand = autos[autos['brand'] == brand] #subsets the `autos` dataframe by each unique value and assigns it to `selected_brand`
price_mean = selected_brand['price'].mean() #calculate the mean of the `price` column for the subset
brand_mean_prices[brand] = int(price_mean) #assigns the key of the dictionary as each unique value and its corresponding value as the mean
print(brand_mean_prices) #prints the dictionary
Output: {'volkswagen': 5402, 'bmw': 8332, 'opel': 2975, 'mercedes_benz': 8628, 'audi': 9336, 'ford': 3749}
You can edit the codes above by replacing the variable names with the ones you already defined in your code and check the output.
Tried to explain as simple as possible, I hope this helps