Removing Outliers Criteria


Why are outliers being removed randomly instead of using statistical methods? In general, the outlier’s parameters are defined the following way:

  • q3+IQR*1.5
  • q1-IQR*1.5

Instead, what I’m seeing is that people are removing values based on their personal criteria.

Hi @santoshector421:

You provided a tag to a guided project. May I know which particular project you are referring to? Please include a question link as per these guidelines if there is indeed a mission link.

If there are several outliers (>7) I will usually stick with this. For data will fewer outliers, I would suggest removing them if those data points are not really critical. In a mission critical project where you have to deploy the model/process data to build a model etc., then I would replace the outliers with the central tendency values not counting the outliers (mean, median or mode) after conducting my EDA since the data points may matter.

ebay car sales data project