Guided Project: Exploring eBay Car Sales Data [Completed but still questioning]

I really worked to go above and beyond on this project. I’m making a concerted effort to get my first position as a data scientist no later than the end of 2023. In general, I’m looking for feedback on my syntax (any ways to clean up certain code more concisely), on my markdown (if I’m describing in a way that is clear), and any additional feedback you as the reviewer may feel could make make me a better data scientist!

More specifically, there were a few things I seemed to keep running into. Please let me know if you’ve seen some of this and/or have any suggestions for workaround:

  1. Lines 32, 34, and 36 of my code has a rather unattractive “SettingWithCopyWarning”. My code still runs and does what it’s supposed to but when I perform some of the suggestions in this pink box, the box STILL doesn’t go away! AHHHH!!! So frustrating.

  2. I had to perform the same conversions on all columns with dates to it could ultimately have a form that is an integer which looks like this: 20160321. Is there perhaps a “for-loop” I could write that essentially reduces the amount of code I have to write for each column to just this one loop which would cover any column with a date in this ####-##-## format?

  3. For line 87, I wanted to print a constructed data frame which has the brand as the first column, model as the second, and count as the third. I reasoned that rearranging the key-value pair in my brand_model dictionary would provide the dataframe I’m looking for, but I couldn’t figure out how to successfully rearrange the pairs into a new dictionary. AHHHH!!!

  4. Lastly, I was curious about WHY odometer values generally followed a correlation of decreasing price with increased miles EXCEPT for 5,000 km. I was wondering if there’s a way to explore or create a dictionary based off most recent dates odometers with only 5,000 km have been posted. I"m essentially looking to create a dataframe comprised only of rows containing 5,000 km which I then would use a “for loop” to iterate over as a way of analyzing whether or not there’s a trend between date posted and price. My hypothesis is that the only reason the average price deviating from what’s expected would be that the bid-price hasn’t gone up yet because perhaps these listings are more recent and haven’t matured to what their actual value is yet.

https://app.dataquest.io/jupyter/notebooks/notebook/Basics-Copy1.ipynb

Blockquote

Hi @kylemoorman1 Could you please upload your .ipynb file here?

Hi @kylemoorman1

Could you please check the file you’ve uploaded. It only has two lines of code.

Exploring eBay Car Sales Data (Project 4).ipynb (202.0 KB)

Click here to view the jupyter notebook file in a new tab

Hi @jithins123 the complete downloaded file should be here. One major question I have is why this line of code: odometer_500 = autos[autos["odometer_km"] == 500] won’t give me the unique ad_created dates when I run this code: odomoter_500["ad_created"].unique(). The error message I get keeps telling me odometer_500 is undefined. I replaced 500 with 5000 and reran the entire kernel but still with no success.

Hi @kylemoorman1
Just had a quick glance at the error. You have typed odomoter_500
Maybe this will fix the issue.

1 Like

Hi,

This is my updated project. At this point I’ll have to be satisfied and move on to my next project. However, I’m still curious about 1-3 from when I originally posted. If anything, I’d just really like to know about the “SettingWithCopyWarning”. Let me know what you think about my progress with the 5,000 km values. I did a more thorough analysis compared to when I first submitted. Thanks so much!

https://app.dataquest.io/c/54/m/294/guided-project%3A-exploring-ebay-car-sales-data/9/next-steps
Exploring eBay Car Sales Data (Project 4)

Hi @kylemoorman1

It’d be great if you could upload the notebook. Thanks.

Regarding the SettingWithCopywarning, please check this and this posts.

1 Like

Here is a more updated project where the “SettingwithCopyWarning” was eliminated. Simply adding an extra line of code once I established a new data frame helped that less appealing error message disappear. Thanks so much @jithins123

This is the code that ultimately made all three of those error messages go away:

odometer_5000 = autos[autos["odometer_km"] == 5000]
odometer_5000 = odometer_5000.copy()

I’m still curious about the following from my original post:

I had to perform the same conversions on all columns with dates to it could ultimately have a form that is an integer which looks like this: 20160321. Is there perhaps a “for-loop” I could write that essentially reduces the amount of code I have to write for each column to just this one loop which would cover any column with a date in this ####-##-## format?

For line 87, I wanted to print a constructed data frame which has the brand as the first column, model as the second, and count as the third. I reasoned that rearranging the key-value pair in my brand_model dictionary would provide the dataframe I’m looking for, but I couldn’t figure out how to successfully rearrange the pairs into a new dictionary.

Lastly, I was curious about WHY odometer values generally followed a correlation of decreasing price with increased miles EXCEPT for 5,000 km. Thanks to your help, I did a deep dive on this specific category of 5,000 km. In my project, I couldn’t really find any trends as to why odometers with 5,000 km have cheaper advertised prices on the whole. I thought maybe there was a correlation with a more recent listing date (thus, the bid hasn’t matured to the actual value). However, the code I wrote for that correlation showed a weak (essentially no) correlation. Was my code written wrong? If not, I’m stuck with other possible explanations.

I also assessed the overall correlation between odometer values and ad price. There was as to be expected a negative relationship. The relationship wasn’t strong, but one existed nonetheless. Thus, I still maintain that the 5,000 km values are a confounding variable. What do you think? Any clarity on why we see this? Am I assessing this correctly?