eBay Car Sales Dataset

Hi Everyone,

Please feel free to comment. The file is uploaded (link below). So far, all your comments have been very helpful in improving my learning process.

Project Title: eBay Car Sales Data

—} Part E: Data Analysis

--> 5. Analyse the data to:
    5.1. Calculate the distribution based on the column: 'reg_year'.
    5.2. Calculate the distribution based on the columns: 'date_crawled', 'ad_created' and 'last_seen'.
    5.3  Select brand and aggregating mean price.
    5.4. Calculate the mean mileage and mean price for each of the top brands.
    5.5. Find the most common brand/model combinations
    5.6. Find out if the average prices follows any patterns based on the mileage.
    5.7. Find out how much cheaper are cars with damage than their non-damaged counterparts.
Before Part E, the following tasks are carried out.

—} Part A: Original Dataset

---> 1. How the original dataset is organised
    1.1.  Observation

—} Part B: Reorganising Dataset

---> 2. Rows & Columns
    2.1. Review of unique values returned as NaN.
    2.2. Review of columns with only 2 unique values.
    2.3. Convert datetype of 'price' and 'odometer' columns from object to integer.
    2.4. Translate non-English word to English words. 
    2.5. Chang the use camelcase to snakecase in the names of columns and reorganising the columns.

—} Part C: Organised Dataset

---> 3. Quick Review of the organised dataset.

—} Part D: Cleaning Data Entries

---> 4. Data entries
    4.1. Remove data for antique vehicles from the columns: 'reg_year'.
    4.2. Remove inaccurate entries in the column: 'reg_year'.
    4.3. Review data entry for columns: 'reg_month'.
    4.4. Check for outliers in the column: 'adometer_km'.
    4.5. Check for outliers in the column: 'price_$'.

Thank you.

eBay Car Sales.ipynb (115.0 KB)

Click here to view the jupyter notebook file in a new tab