I’ve been cleaning the column holding power values(originaly ‘powerPS’, I’ve called it ‘power’), the abnormally high values were easy to clean. But… There are around 4000 cars listed with 0 (zero) power. To drop that many rows would be a bit of a waste of data…
I’ve filled them in using simple mask with average power number for the whole database
mask_0 = autos['power'] == 0
autos['power'] = autos['power'].mask(mask_0, autos['power'].mean())
I’d like to do it in a more precise way, instead of putting in same average value for all of the missing values. I’d like to enter an average value for each brand.
name_power = autos.groupby('brand').mean()['power']
Been toying with the idea a few hrs and hit a wall, how can I do it? The above table has all the necessary data, tried loops, masks etc. But couldnt figure it out today.
Apparently I can’t delete my own posts, in that case I’ll just post my solution:
brand_names = autos["brand"].value_counts().index.to_list()
for car in brand_names:
mask_car = (autos['power'] == 0) & (autos['brand'] == car)
mean_power = autos[(autos['brand'] == car) & (autos['power'] != 0)]['power'].mean()
autos['power'] = autos['power'].mask(mask_car, mean_power)
You can delete your own posts but I’m glad you answered your own question and shared it with the community instead! For future reference, try clicking on the three little dots below your message to see the option to delete.
Also, just to clean up your code a bit, you could have defined
brand_names using a more direct pandas function:
brand_names = autos['brand'].unique()
Congrats for figuring this out and thanks for sharing!
thanks for the tweak,
think I’ve made a mistake: tried deleting the topic(not the post).
If you are the author of the topic and you delete your post, the topic will automatically be deleted after 24hrs.