Removing Values from Dataframe (Ebay car sales project)

I’m working on the guided project “Exploring Ebay Car Sales Data” in the introductory Pandas and Numpy course.

I’m at the step of removing outlier data from the columns “Price” and “Odometer”.

In “removing” the data, do I actually want to remove those entire rows from the dataframe? Or should I convert the values in the relevant columns to NaN values? I’m afraid removing thousands of rows due to a single faulty datapoint might skew other aspects of the analysis.

I feel like the instructions aren’t totally clear at this point (Screen #4 of the mission). Any thoughts?

You’ll go over additional methods to impute faulty data later in the path — even if you convert to NaN, you’ll likely still want to drop those rows since you’ll be aggregating values in these columns.

In any case, there isn’t a strict way to work through GP’s — there is no right or wrong answer in a real world project like this. Do what you feel is the most logical, examine the results, and iterate on it if it’s not to your liking.

1 Like