Hi! I wanted to exclude outliers by keeping a certain range of values in a column. Here is my code:
autos['registration_year'] = autos['registration_year'].between(1900,2016)
There is no error, but the columns turned out to have only 2 unique values.
I looked at the solution, the code is :
autos = autos[autos["registration_year"].between(1900,2016)]
I wonder why my previous code doesn’t work?
autos['registration_year'].between(1900,2016) returns a boolean series:
Name: registration_year, Length: 48565, dtype: bool
So instead of filtering out the values outside the range, we’re replacing
autos['registration_year'] with a list of True/False values.
What the solution code does is take the boolean series and use it as a mask for the whole dataframe, so that any False results from the ’
registration_year' column will be excluded from