I am having challenge identifying what the outliers are. Please can someone explain this to me. Below is the screenshot.
Please try to follow Introducing guidelines for all technical questions in our Community to raise your questions precisely and concisely, as it helps the community members to better understand the question and answer efficiently.
As I am not sure if by “outliers” you don’t understand them in general or just for this case, I will start from the basics.
Take a sample data of 5 numbers:
1, 3, 4, 2, 5. They are close enough. They don’t vary much from each other.
Let’s say we add a 6th number to this:
1, 3, 4, 2, 5, 8. They are still close to each other.
But let’s just add another number to this group:
1, 3, 4, 2, 5, 8, 137. Now the last number is too far off. If you plot this on a number line it will look something like this:
(Please don’t take the scale or the fig seriously as I don’t have a dedicated tool to draw )
If you notice, with the dummy scale itself the line seems to be short to cover the data-point 137. In this case, 137 becomes the outlier. It lies so far away from most of the data-points or the median of the dataset. It would be an outlier if it were -137. The direction doesn’t matter. The magnitude of the difference does.
The mission instructions tell you how to find outliers in that specific column. The method described in the screen-shot is the
It provides a minimum boundary(x) and maximum boundary(y). All the data points that are between and including the boundaries are selected. Rest all values are treated as outliers and are filtered out.
You may refer to the official doc here