Your code is very well laid out and concise - very readable. I also like the use of headings to break the sections up.
I also liked some of the language (e.g., “Let’s take a look at the min and max values of the ‘price’ column to see if there are any anomalies.”). It shows you have a curious mindset, which leads to a rigor in your analysis. This comes in handy for data cleaning.
I also appreciate your work in going beyond the Guided Project and try to tackle the more interesting questions such as characterizing relationships between variables (e.g., price and odometer readings; damage and non-damage cars, etc).
A couple of next steps that I offer your to consider:
- You mentioned you encountered a ‘SettingWithCopyWarning’. You can resolve this by use
.loc or using
df.copy(). Check out this StackOverflow post for more details on what you keep getting the error.
- I loved challenge #4 on trying to see the trends of price and mileage. One minor thing in your code:
mileage_50k_150k = autos[(autos['odometer_km'] >= 50000) & (autos['odometer_km'] <= 100000)]['price'].mean()
mileage_above_150k = autos[autos['odometer_km'] >= 100000]['price'].mean()
Be careful here because you are double counting vehicles with a odometer reading of 100,000 in both these groups. I believe you wanted < 100000 in the first line.
- You might want to start experimenting with graphs such as using matplotlib and seaborn. This will really make your trends pop out to readers, as well as help you understand the data better. If you are not there yet in Dataquest, that’s fine! But you can always return to this project at a later date and build on.
- Lastly, and this may be a personal stylistic thing based on my education and background, but I always like to include a limitations section of my analysis as well as future analyses that could be done. I feel this sets the proper context in which you can draw your conclusions, and give readers (or yourself!) ideas to explore later on.
Good project and keep it up!