Why I can't use series.['column'] = series.['column'].between(xx,xx) to eliminate outliers?

Hi! I wanted to exclude outliers by keeping a certain range of values in a column. Here is my code:

autos['registration_year'] = autos['registration_year'].between(1900,2016)

There is no error, but the columns turned out to have only 2 unique values.

I looked at the solution, the code is :

autos = autos[autos["registration_year"].between(1900,2016)]

I wonder why my previous code doesn’t work?

Hi @stellayou1126. autos['registration_year'].between(1900,2016) returns a boolean series:

0        True
1        True
2        True
3        True
4        True
         ... 
49995    True
49996    True
49997    True
49998    True
49999    True
Name: registration_year, Length: 48565, dtype: bool

So instead of filtering out the values outside the range, we’re replacing autos['registration_year'] with a list of True/False values.

What the solution code does is take the boolean series and use it as a mask for the whole dataframe, so that any False results from the ’registration_year' column will be excluded from autos.

1 Like