My Guided Project (eBay Car Sales)

Hi everyone… :rose: :rose:

This is my attempt on the Exploring eBay Car Sales Data guided project.
I will be grateful and appreciated to receive your feedback on my project. :rose: :rose:

Exploring eBay Car Sales Data.ipynb (106.5 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @zainab_ali_alamer,

Great job your project! It looks very well-structured and well-commented, the introduction and conclusion are enough detailed, informative and to the point. Good idea to put the studied resources at the end of the project. Also, I liked that you checked the dataset on the presence of duplicated rows.

Here are some suggestions from my side:

  • When mentioning pieces of codes, functions, methods, or column names in markdown, it’s better to include them in backticks (`). For example, nr_of_pictures instead of nr_of_pictures. It will give them more emphasis and facilitate reading.
  • About the range of car prices, probably you should search some more information on the Internet. When I was doing this project, I found a couple of forums, unfortunately in Russian, where people were discussing exactly those cars with 0 price on German eBay. And it seems that they really exist, the cars that are just given free, because of some particular reasons, of course. Actually, the cars with the price of 1$ don’t look less scaring either :blush: The same thing about the upper limit, probably it should be increased (it’s possible that some “retro” cars have a very high price). But of course, all these 0-priced cars and those extremely expensive represent a very small percent of all the data, so just dropping them can be a reasonable approach.
  • Instead of dropping separately minimum and maximum outlier ranges, you can first define your ultimate minimum and maximum limits and then use between() method to cut off all the values outside this range (re-assigning the result back to your main dataframe).
  • The registration_year range. Here the lower limit can be increased. Even though the first cars appeared in 1885, it is really very low probable that those first cars are represented in this dataset :smirk: Also, looking at the rows (which are very few) earlier than 1910, we can see that the predominant majority of columns for these entries have missing values. So dropping these rows seems a good idea.
  • The code cell [49]. In this way, you are not really substituting these values with NaN. Instead, you should use np.nan:
import numpy as np
autos.loc[autos['registration_month'] == 0,'registration_month'] = np.nan
autos['registration_month'].value_counts(normalize = True, dropna=False)
  • The markdown cells after the code cells [5], [11] and [20]: it would be good to use here markdown bullet lists for more emphasis.
  • For the sake of consistency, it’s better to make all the subheadings in title style (each word starting with a capital letter).
  • The code cells [27] and [36]. I would remove the comments describing technical details about how those functions and attributes work.
  • The round() function can be applied to brand_by_mile.

Hope my suggestions were helpful.
Once again, congratulations on doing a great project! :partying_face:

I have been hopeless to get any feedback on my project. :sob:
I can’t thank you enough for your detailed notes and useful feedback. :sob: :rose:
THANK YOU Elena_Kosourova :rose: :rose: :rose:

1 Like

That’s great @zainab_ali_alamer, I’m glad that my feedback was helpful! :star_struck: