Guided Project: Exploring eBay Car Sales Data - a full project with finished additional tasks

Hi. I just finished my third guided project. It took 7-10 days. I’ve done all tasks that I was told to do. Sometimes I did more than it was told ( because I was feeling that some things should be investigated further, or analyzed better). Any feedback is welcome!:slight_smile:

I’d like to thank all members of DQ that helped me with this project. Especially thanks to Elena Kosourova :slight_smile:

Guided Project: Exploring Ebay Car Sales Data

Guided Project Exploring Ebay Car Sales Data.ipynb (161.5 KB)

Click here to view the jupyter notebook file in a new tab

ps. I joined two additional exercises:

“Identify categorical data that uses german words, translate them and map the values to their English counterparts” with “See if there are particular keywords in the name column that you can extract as new columns” in the one block.

pps. The version of the project posted here is the latest version. All things discussed below are updated now. So, this version from this post, it’s a final version.

1 Like

Hi @drill_n_bass,

Thanks for sharing your project! And I also noticed that you mentioned me in it, thanks a lot!!! :grinning: I’m glad that my suggestions were useful! :partying_face:

Your project looks very nice, well-structured, well-commented where necessary, with a thorough analysis and interesting insights. Also, good emphasis of the main ideas and intermediate conclusions (using italic and bold). And finally it’s not only me who thinks that it’s not so strange to have free cars on eBay! :money_mouth_face:

Here are some suggestions from my side, hope they will be interesting:

  • While you wrote intermediate conclusions after different sections of your project, it’s still a good practice to write also a general conclusion at the end of the project, with the most important results.
  • Please add a link on the original dataset in the introduction (or the modified version, since you mentioned that the original is not available anymore).
  • A good idea is to use backticks when mentioning column names in markdown cells. Then they will be more eye-catching.
  • I would omit giving technical details and describing in markdown cells different methods and how they work (for example, those right before the code cells [4], [10], and [21]). The reader can always find them in the documentation (hopefully in the last edition :stuck_out_tongue_winking_eye:). Alternatively, you can add these technical details as short comments in the code cells.
  • About the code commenting now. Your code is rather well-commented, only that I would place each comment before the code lines it describes (i.e., not laterally). For example, in the code cells [21], [29], and some others. Well, if a comment is really very short (1-3 words), it’s ok also to put it laterally.
  • You’d better remove all the commented-out code from the project (since you don’t use it anymore), like in the code cell [30] .
  • The code cells [21]-[23]: you can use print('\n') to visually separate all the printed outputs.
  • About between() (it seems that you had doubts about it in the code cell [16]): by definition, it includes both limits.
  • I noticed you tend to use (print()) instead of print(). It’s not an issue, but the external parenthesis are redundant, better to remove them.
  • The code cell [33]: I wouldn’t convert datetime data into int.
  • The code cell [31]: here it’s better to use unique() instead of autos to check “suspicious” columns which potentially can contain German words.

That’s all from my side. Good job indeed!
Happy learning and good luck with your further projects! :slightly_smiling_face:

1 Like

I will update all things you mentioned in my initial post. I appreciate your help, thank you! :smile:
I don’t understand what you meant here:

About this:

yes, I did sometimes, so the code isn’t so long ( according to PEP 8 I should fold it into few rows when it’s long). Quote from this site: “When you’re using line continuations to keep lines to under 79 characters, it is useful to use indentation to improve readability. It allows the reader to distinguish between two lines of code and a single line of code that spans two lines. There are two styles of indentation you can use.”

You are right. I missed some german words ! Now it’s perfect( there was a need to update a lot to be ok)

1 Like

Hi @drill_n_bass,

I’m glad that my feedback was useful! :slightly_smiling_face:

About print(). The idea to divide your code into several lines for better readability is great, I also do so. However, in case of the print statement, you can still divide your code in several lines without the necessity of using the external parenthesis. I mean, for example, this code from the code cell [21], with the external parenthesis:

(print('val_desc_date_crawled_sorted:', '\n', 
       autos['date_crawled'].str[:10]
 .value_counts(normalize=True, dropna=False)
 .sort_index(ascending=True))
)

is the same as this, without the external parenthesis:

print('val_desc_date_crawled_sorted:', '\n', 
       autos['date_crawled'].str[:10]
 .value_counts(normalize=True, dropna=False)
 .sort_index(ascending=True))

Of course it’s not an error, nor a big deal here if to use or not the external parenthesis :blush: Anyway, it’s always a good idea to reduce unnecessary elements when writing any code.

Now about between(). In the code cell [16] you could use directly between(0,10000000) to preserve the values in these limits (i.e, the values from 0 to 10000000 will be preserved, while all the values outside this range will be excluded). By the way, now that I’m thinking about it, probably you were going to exclude the car with the price 99999999$? Then the piece of code above should be between(0,99999998), i.e. the price of 99999999$ will be excluded in this case.

The range was set in such way, that the maximum limit is 10,000,000 USD. I believe it’s a possible price for the car… wait! update: almost 19,000,000USD - ok, I will upgrade it to 20,000,000USD. Quote: " A new Bugatti costs from 1.7 million USD for the cheapest model, a Bugatti Veyron, to upwards of 18.7 million USD for a Bugatti La Voiture Noire , the current most expensive model on the market." Source link

The output seems to be correct:

Noted, upgraded, thank you! :slight_smile:

I will upload the final version of my project, just let me know how things are with this between() issue.

Now it’s the final version of my post - sorry for editing : P

Hi @drill_n_bass,

Ah, the value 10000000 is included, ok. But then you still can use between(0,10000000) anyway, because both 0 and 10000000 will be included in the range to keep (the between() function has a parameter inclusive that by default is True, meaning that both lower and upper limits are included in the range, unless you decide to use inclusive=False). I just mean there is no need to extract 1 from 0 or add 1 to 10000000 :blush:

Wait, but if you want to preserve both 0 and 10000000 in the dataframe, then you don’t need this between() at all, since these values are already the minimum and maximum of the price column! :cowboy_hat_face: only now I realized it.

Now I understand what you meant! :slight_smile:
So:

This one is a new mystery! :joy: how it is so? I don’t need this code block I mentioned above? O_o

Isn’t it the only code in notebook that clean outliners in price column ?

I mean, of course you need to remove outliers, only that in this case you decided not to consider the anomalous values as outliers :wink: Usually people consider as outliers for this columns exactly these values: 0 (which I don’t agree with) and 10000000 (which I’m also having doubt about). Let’s say, there are no other anomalous values. You can check it running autos['price'].max() and autos['price'].min() (before running between(), of course).

Anyway, it’s quite ok not to remove these “outliers” in given case, since they can easily be real values.

ok, now I know why I couldn’t understand you. I didn’t check now, what was the “highest outliner”. I estimated without any empiric checking that there will be some outliners like 100,000,000$ or even higher… :stuck_out_tongue:
anyway, this project is finally ended. Felt like “never-ending story” :roll_eyes: :joy:

I upgraded it in my first post ( if someone would need it).

Oh, these DQ guided projects are becoming more and more never-ending, I would say! :grinning:

Ok, great, then now I’ll render your latest notebook version in nbviewer (otherwise the old notebook still remains there).

1 Like

Hi mate!

I really like your project! I’ve just finished same one and It was cool to see how you did additional tasks. I’ve done them little bit different, maybe you would be interested to see other approach to the same problem.

my project

Anyways good job, and hope to see more of your projects here!

cheers!

Hi Basti :slight_smile:
Sure I will on this weekend. The more perspectives we have and explore, the more flexible is our craft.

1 Like