BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

Get rid of outliers

Hi, I want to start a simple analysis with a dataset I found on Kaggle. For this I want to get rid of outliers. To do this I want to delete all entries which are below 2nd quantile and above 3rd quantile. LIke:
no_outliers = np.quantile(df_diabetes, .25, axis = 1)
Which displays all values above .25
I have no idea how to get all values in between .25 and .75 and than how to assign them to a new dataset.
And to be honest I deed not grasp the axis argument fully in this case. I set it to 1 because I want to take all entries into account and not onyle that of a specific column. But that is far off from being correct I suppose.
Thank you in advance

Hi @info137 and welcome to the community!

Without looking at the dataset myself, perhaps pandas.Series.between( left , right, inclusive=‘both’) is the method you’re looking for? It returns a boolean series representing whether each value in the series is between left and right. You could then use this to mask your original dataframe to remove outliers.

1 Like