Guide Project: Exploring Ebay Car Sales Data + Extra Steps!

Hi!

I’m happy to share my 3rd project from dataquest.
I did extra tasks on this project so any feedback very welcome. Specially about Find the most common brand/model combinations task, I would like to know how to better present this data, or maybe there’s better method to do it?

I really enjoy making this project and I’ve learnt a lot, so I really hope you guys like it!

Thank you!

Edit: Thanks to @drill_n_bass for helping I added your code to my project!

eBay - Project.ipynb (157.7 KB)

Click here to view the jupyter notebook file in a new tab[eBay - Project.ipynb|attachment]

1 Like

Hi @Basti

Thank you for sharing another project on Exploring Ebay Car Sales Data. I am so pleased wit your project and this time, you have gone beyond to present wonderful hidden information on the most common brand and the model, that is, your last working on cell[72] , I really enjoyed reading through the outputs. This has exposed me as a reader on the fact about car brand/model, and thumbs up for this buddy. For the entire project, all is well, the introduction, the aim, the explanations given on the markdown cell , the conclusion ,are so informing and well organized.

just a suggestion, when you check your working on converting the column name to snakecase format, I think it could have been more better to name the new column differently from the word column, having autos.columns = columns is kind of confusing.

Otherwise, congratulations for the good work!

happy learning.

Hi Brayan,

Thanks again for your feedback. I really appreciate it!

1 Like

Hi Basti, I’ve seen that you use pep-8 in this project. That’s great!
I have a few suggestions that you may find useful:

  1. find the line: “The data dictionary provided with data is as follows:” and instead of typing columns names as you did, do something like this:
  • `dateCrawled` - When this ad was first crawled. All field-values are taken from this date.
  • `name` - Name of the car.
  • `seller` - Whether the seller is private or a dealer.
    etc.

They will be more visible/eye-catching with these symbols: `
It will looks like this

  1. please find the part “We can see that some cars are getting more expensive when damaged, lets have a closer look at them.” - most cars are less expensive when damaged ;). Just one brand seems to be the opposite (probably there are some hidden factors that created this anomaly - worth to check it too :slight_smile: ).

  2. I think that cell [72] might be reshaped: you could create a new data frame for the data instead of printing it. That would save a lot of space. Columns: [Brand, Model, Top model].
    One instance of your output has 4 lines of data no needed.

Final thoughts: I like your project too. I see that you put a lot of effort to prove how much you learned. I’m sure that every new project will be better and better! :slight_smile:

ps. I could miss some errors because It’s 2 am on Friday night, so I hope that other members of the community will give some feedback too. :wink:

Thank for your feedback!

  1. Cool!

  2. That was my conclusion. I checked and only 7 models somehow are more expensive, but overall they are 52% cheaper. Its all in my project :wink:

  3. Yeah, but that’s a problem because I don’t know how to extract just this useful data. I want to present it better, but couldn’t find solution.

OK. I could have overlooked it. :stuck_out_tongue:

I’ll try to check it out today if no one overtakes me with solution.

1 Like

Hi! Very clear code indeed!

I just have one question to the dataset itself; the one linked by you is the original one with 370000 entries instead of the claimed 50000; the data is already somewhat cleaned (for example price is already an integer)… Apologies if my question is dumb or something, but how is it possible? :slight_smile:

Hi,
you have an answer in the next line of my project :stuck_out_tongue:

A few modifications have been made from the original dataset:

- 50,000 data points were sampled from the full dataset
- The dataset has been modified by Dataquest to be less clean.

Thanks for an answer! But my point is - where is this modified dataset? The one linked by you is not modified. I am trying to do the project myself, but I am not able to find the modified dataset. :slight_smile:

aaah ok. I uploaded it for you: https://file.io/HZJfTWERmZAb

Awesome!!! Thanks :slight_smile:

I had to change syntax of this part of your code:

 top_model = b.value_counts('model').head(1) # find the most common model
 top_model_pct = b.value_counts('model', normalize=True).head(1) # percentages

… On the DQ platform and Jupyter notepad, it threw an error:

So, I did this almost from scratch. Full code:

grande_finale_top_model = {}
grande_finale_top_model_pct = {}

for x in brands:
    b = autos[autos['brand'] == x] # select rows for every brand
    
    # find the most common model; remove dtype by `to_string()`:
    top_model = b['model'].value_counts().head(1).to_string() 
    top_model_pct = b['model'].value_counts(normalize=True).head(1) # percentages
    grande_finale_top_model[x] = str(top_model)
    grande_finale_top_model_pct[x] = round(float(top_model_pct), 4) # round to 4 digits after comma

    
# Test:
# for key in grande_finale_top_model_pct:
#     print(key, ':', grande_finale_top_model_pct[key])
    
top_model_series = pd.Series(grande_finale_top_model)
top_model_pct_series = pd.Series(grande_finale_top_model_pct)
top_models_pd = pd.DataFrame(top_model_series, 
                                  columns=['Top_model'])
top_models_pd['PCT for Top model'] = top_model_pct_series
top_models_pd['PCT for Top model'] = top_models_pd[
    'PCT for Top model'].astype(str) + ' %' # Let's add "%"
top_models_pd

(Part of) output:

I hope it helps…
:sunglasses:

Wow, thank you so much!