I’m currently going back and redoing the guided projects including the next steps and would love your feedback on my improved work. I particularly need your opinion on one of the Next analytical steps, namely splitting the odometer_km into groups, and using aggregation to see if average prices follows any patterns based on the mileage. I struggled to iterate over bins created using value_counts(). I finally found a way to turn them into tuples. If there is an easier way to do this, I’d love to hear about it.
Thanks in advance,
Exploring Ebay Car Sales Data.ipynb (85.7 KB)
Click here to view the jupyter notebook file in a new tab
For my project I did the following:
- divide into 5 bins using the pandas
cut() function, assign the result to a new column (basically, this function marks to what bin each entry belongs)
- use the
groupby() method to group the table by newly created column and calculate the mean price and the ammount of each bin.
groupby() method seemed really nice solution to me, but I don´t really know if it´s not only nice for saving some code lines but also for the RAM usage.
Anyway, you can check it up in my project here: Exploring Ebay Car Sale Data. Feedback is more than welcomed
Thanks for the reply. I had a feeling that groupby() would come in handy. Now it looks much better!
I went through your project - I really like the way you narrate what’s going on. Also, kudos about finding ways to combine the different analysis topics. It looks very pro!
Thank you! I’m glad that you liked it, especially the narration part. I took me almost 2 weeks to finish the project mostly because of the narration🙈 Supposedly more practice I get, more smoothly it’s gonna go. At least I hope so!
I like your project.
The only thing I’d change is ‘limousine’. It’s translated as ‘sedan’ in English. You can have a look in Wiki.
Thanks for pointing that out. I guess I didn’t notice it as the word ‘limousine’ exists in my native tongue. I will fix it right away.
Have a lovely day!
First, congratulations! That’s a nice idea to re-do all your projects with the new knowled. I hope to do the same very soon.
Just have in mind the following statistics concept when taking notes in the analyses part: we are always dealing with a sample.
In other words, whenever we have a conclusion, it’s important to highlight that it is only valid for our dataset.
To make generalizations, like the one you did in the last line: ‘On average, cars with unrepaired damage are 30% cheaper than undamaged cars.’, you must apply techniques like Hypothesis testing or Confidence interval. They both assume the data is i.i.d (independent and identically distributed).
I guess Dataquest introduces theses concepts in step 5.
I hope I haven’t been rude, English is my second language.
Thanks for the comment! No, it’s not rude at all. On the contrary, it is spot on - I need to be more careful not to generalize. I guess, revisiting some of the theory would be a good thing to do soon too.