Please feel free to comment. The file is uploaded (link below). So far, all your comments have been very helpful in improving my learning process.
--> 5. Analyse the data to: 5.1. Calculate the distribution based on the column: 'reg_year'. 5.2. Calculate the distribution based on the columns: 'date_crawled', 'ad_created' and 'last_seen'. 5.3 Select brand and aggregating mean price. 5.4. Calculate the mean mileage and mean price for each of the top brands. 5.5. Find the most common brand/model combinations 5.6. Find out if the average prices follows any patterns based on the mileage. 5.7. Find out how much cheaper are cars with damage than their non-damaged counterparts.
---> 1. How the original dataset is organised 1.1. Observation
---> 2. Rows & Columns 2.1. Review of unique values returned as NaN. 2.2. Review of columns with only 2 unique values. 2.3. Convert datetype of 'price' and 'odometer' columns from object to integer. 2.4. Translate non-English word to English words. 2.5. Chang the use camelcase to snakecase in the names of columns and reorganising the columns.
---> 3. Quick Review of the organised dataset.
---> 4. Data entries 4.1. Remove data for antique vehicles from the columns: 'reg_year'. 4.2. Remove inaccurate entries in the column: 'reg_year'. 4.3. Review data entry for columns: 'reg_month'. 4.4. Check for outliers in the column: 'adometer_km'. 4.5. Check for outliers in the column: 'price_$'.