Refurbished project: Analyzing used car sales on E-bay Germany

Hi DQers!

I re-did this project on analysing car sales on E-bay Kleinanzeigen. The old one did not have any plots.

This one has a couple of plots. The highlight is the last plot that shows the difference in price of repaired vs. damaged cars. When I originally did this project, I really wanted to see this plot, but did not have the know-how to bring it out.

Any feedback is welcome!

Last page of project: Learn data science with Python and R projects

Project file:
Analyzing Used Car Sales on E-bay Germany.ipynb (533.4 KB)

Click here to view the jupyter notebook file in a new tab

2 Likes

Hi @jesmaxavier, congratulations on refurbishing your project:) I liked the overall structure of the project (it’s neat and clean) and also docstring of the functions, it’s really a game-changer for someone who reads the code. Also, I liked nice visualizations with nice colors.

A few suggestions from my side:

  • Clarify the questions you want to answer in the introduction. More questions can arise as you dig in the data so you can add them as you progress in the project. This will really help the reader to have an idea of what’s this project will be about
  • Link the data set you’re using
  • Make all imports in one code cell, it gives an overall feeling of what libraries you’ll be using in the project. It’s also better to place %matplotlib inline (and all other magic functions) below the imports
  • I noticed that you are using matplotlib only once, in [25] as matplotlib.pyplot. Just use plt there and you won’t need that import
  • The data is clearly from a German audience. - shouldn’t it t be for a German audience?
  • You are using docstrings but you should stick to just one style of describing the functions. For example, sometimes you are using string and sometimes str, use just one of them (i.e., [6] and [7]). I advise you to use str as it’s the official name of this data type in Python
  • After [13], you are describing the issues you found in the columns but do not support the conclusions with any data (should you maybe demonstrate the statistical data of each column?). Also, consider using value_counts() from pandas:slight_smile:
  • Not a suggestion but a shout-out for great and precise criteria of what’s considered a car in What is a Vehicle?.
  • In [16] it’s not necessary to assign price to a separate variable
  • These listings include the words ausschlachten or schlachten (literally means butchering). - you are the first project creator I saw to notice this issue. Well done!
  • It must also be noted that anything between $1-$200 does not seem to be a valid price for a car and are therefore not being considered. - You have some issues with bold text here
  • Also you are escaping the $ character and it’s displayed with \. Are you able to solve this problem?
  • The plot Price($) of Repaired vs. Damaged Vehicles is a bit clogged with the information. You could maybe separate it into two plots and make the legend more human-readable:)
  • Make the text bold in the conclusions to highlight the most significant insights of the project

That’s it for me! Happy coding and good luck with your next projects @jesmaxavier.

P.S. I believe this project can compete in the Community Champion competition, @Elena_Kosourova.

2 Likes

That’s absolutely true, Artur, the project of @jesmaxavier is once again a perfect candidate! :star_struck: Also, @artur.sannikov96, great job on such a cool and informative feedback :heavy_heart_exclamation:

2 Likes

@artur.sannikov96 thank you very much for such an in-depth review. It is really helpful.

Clarify the questions you want to answer in the introduction. More questions can arise as you dig in the data so you can add them as you progress in the project. This will really help the reader to have an idea of what’s this project will be about.

  • I’ve had to fight myself with this regard quite often. Do you let the user know all the questions we will be exploring, considering that the dataset is still dirty? I mean, the logical step to go in to what questions we want to explore should come up after we have done some analysis on clean data right… otherwise our questions itself could be wrong…right? I would appreciate your thoughts on this.
    I’ve wondered this because in DQ we are frequently directed and shown what the issues in the data might be, but in the real world this is not the case.

Link the data set you’re using

  • I was unsure of doing this because in-case the reader wants to verify the dataset, they wouldn’t find the errors that we found as many of them were introduced by DQ. One of the funny ones was the price, $12345678. I believe there are 3 price listings with this price.
  • Done :+1:
  • In the column_detailsfunction, I do use the value_counts function. I was hoping when creating this function that it would show that I’ve taken the effort to verify each column individually (which is why I did not individual exploration for each column). You’ve shown that it does not come across as expected. I’ll work on it.

It must also be noted that anything between $1-$200 does not seem to be a valid price for a car and are therefore not being considered. - You have some issues with bold text here

  • Fixed :+1:

Also you are escaping the $ character and it’s displayed with \. Are you able to solve this problem?

  • Yes, funnily enough, with another escape character. Not sure how that works, but it works!
**anything between \\$1-\\$200 does not seem to be a valid price for a car and are therefore not being considered.**

And you get this
anything between $1-$200 does not seem to be a valid price for a car and are therefore not being considered.

The plot Price($) of Repaired vs. Damaged Vehicles is a bit clogged with the information. You could maybe separate it into two plots and make the legend more human-readable:)

  • This was, originally, separate plots. But I just had to have them together to check for anomalies, and there was one, the brand Lancia. So I left it as is. In case someone was curious to find the why.

Make the text bold in the conclusions to highlight the most significant insights of the project

  • Again, another point of confusion. Since, my main points have been highlighted all over the project, I was wondering whether it was appropriate to re-highlight them in the conclusion.

Thanks again for taking the time to review! You have good one :beers:

2 Likes

This is true. I guess the best solution here would be to write those questions in the beginning after you’ve done all the process of cleaning. A reader should anyway know what are the questions. Many will just read those and the conclusions with the answers to the questions.

You could say a couple of words about what errors you found. It’s always a good idea to leave references to the data you are using.

1 Like