BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

Lame Cars Are A Better Deal

Hello everyone. Here is my third guided project. I started it a few weeks ago but did most of the work after taking the Data Visualization Fundamentals course. Here are some things I would like to get some feedback on:

  1. Please take a look at the ratio I calculated and my main conclusion. Is it accurate?
  2. Is adding a ratio column to a data frame considered feature engineering? If so, :cowboy_hat_face:

If I had more time I would really like to make a dashboard or see if there was some interesting graph analysis.

Last Screen From Guide

Lame Cars Are a Better Deal.ipynb (278.2 KB)

Click here to view the jupyter notebook file in a new tab

Thanks for taking the time to look this over. Truly, it’s nice of you to do so.

2 Likes
  1. I like the ratio idea, it actually opens up few new possibilities like: how much each brand/model is losing of that ratio per year, which ones hold their value regardless of mileage and definitely how engine size affects that ratio (I’ve extracted engine size in my proj)
  2. I would title the conclusions and put them at the end, no coding after conclusions.
  3. make those plots bigger
  4. If you’re playing with postcodes, you should try geopandas ( I’ve used this car dataset to try and learn the basics of geopandas: My intro to GDP

When it comes to postcode corelation…

it is true at the north end of the country,but doesn’t really work as you move towards the center (highest postcode numbers), south has a higher average price

Compare that with the postcode map:

Think the area that the postcode covers would have a high negative correlation with avg price

Thanks! I appreciate you taking a look.

  1. I changed the layout re: my conclusion.
  2. I used plt.rcParams['figure.figsize'] = [10,7] to increase the size of all plots in the first block of their section.
  3. I would definitely like to learn about geopandas. I checked out your repo and thought it was really cool. Also looked at your eBay project and maybe that Fiat Punto had a sweet paint job or something. :slight_smile:

Here’s the edited version.

Lame Cars Are a Better Deal.ipynb (261.8 KB)

Click here to view the jupyter notebook file in a new tab

Click here to view the jupyter notebook file in a new tab

1 Like

brucemcminn I think your project looks great. It’s an interesting approach and easy to follow.

It sounds like you are interested in exploring geospatial analysis. Apologies in advance if the following is too detailed.

One thing to keep in mind is Toblers First Law of Geography: "“everything is related to everything else, but near things are more related than distant things.” This describes the phenomenon of spatial autocorrelation. What this means for your analysis is that running a traditional correlation on postal codes can cause misleading results since it discounts the spatial pattern.

Another way to say this is we would expect neighboring postal codes to be more similar than distant postal codes. Morans I is a method to quantify this spatial effect on your data. It provides a global and locate measure. The global measure ranges from -1 to 1. 1 means that neighbors are very similar, -1 means neighbors are very different and 0 means that there is no spatial effect (in this case you could use a traditional correlation measure).

PySAL is a python package that allows you to calculate Morans I. There is also GeoDa, a free and open-source software package that is fairly easy to use and has good documentation. For either of these, you would need a spatial definition for the postal codes (e.g., shapefile or geojson).

Generally, there are some extra considerations for spatial data analysis. There are a lot of amazing tools to map our data, but without caution, the results can be misleading.

Thanks Emily. It would be interesting to look at a dataset with and without using Morans I and see how big an effect it has. One thing I’m interested in is seeing what are the limits of different techniques, or how things are done incorrectly. I’ll take a look at PySAL and GeoDa next week when I have more room in my schedule.