It’s my first time using NumPy and Pandas for data analysis and I really enjoyed the experience that came with this project. It was a bit challenging at first especially when I reached the aggregates part for prices by brand and mileage by brand, but I managed to figure my own way of understanding it.
Below is my work, please comment and advise.
Thank you.
car_exploration_analysis.ipynb (135.1 KB)
My Github link: Car Sales Exploration (Github)
Click here to view the jupyter notebook file in a new tab
1 Like
Hi @o.abucheri,
Nice job completing the guided project especially considering the slight challenge you faced when aggregating data. It’s also nice that you enjoyed the experience.
My thoughts:
- You’ve done well briefly describing the data set you used. Also consider adding a link to the data set. I think it’s not on Kaggle anymore but it was moved here.
- Some typos need cleaning up. It’s a minor thing but every small thing counts when sharing your work with others i.e. people can be a bit nit-picky at times and typos can adversely (and implicitly) affect people’s perceptions even if the project is good overall.
- You can also combine code cells
[1]
, [2]
, [3]
into one. It makes the narrative a bit less fractured. The steps are fairly simple and can be explained with one paragraph.
- Similar to typos, consider cleaning up some of the unused and commented out codes.
- When you realised that you accidentally added a white space to
registration_year
, you can just modify the code in [8]
directly by removing the white space and then rerun the notebook. It’s not necessary to do the fix later; the readers don’t know that you made a mistake and they can only see what you presented to them. (Keep it a secret
).
- " Each column has a count of 50000 records and colums such as
seller
and offer_type
have almost similar records." → by count, do you mean they all have 50000 rows or that they have 50000 non-null values? The describe
table only shows the number of non-null values thus not all columns have a count of 50000.
- "
seller
and offer_type
have almost similar records." → I think the word “records” here can be a bit ambiguous because I assumed you meant “rows”.
- " The
num_photos
column looks very funny and needs some further looking into." → maybe expand a bit on what you mean by “very funny”. One reason why you think the column is funny should be good enough e.g. all NaNs and 0s.
- Some of the text might be better suited as code comments e.g. " When removing outliers, we can do
df[(df["col"] >= x ) & (df["col"] <= y )]
, but it’s more readable to use df[df["col"].between(x,y)]
"
- It’s quite odd that aggregation is written as
aggregation
. I’m not sure if it’s necessary to use the code style in this case.
- Add a conclusion to briefly summarize all your findings.
One clear pattern I see from reading your notebook is you’re very thoughtful and analytical when you explained each finding , thus making the notebook quite an enjoyable read.
Thank you for sharing your project and keep up the good work. Cheers.
1 Like
Thank you. I’ve made the changes.
2 Likes