This is my very first complete data cleaning project.
After browsing a few projects from the community, I’ve noticed there is so much I can improve on, especially the remarks and comments. There seems to be a ‘standard’ format? Like having the data dictionary in the beginning and Conclusion at the end.
Also, it took me a lot of time to get the last analysis done:
How much cheaper are cars with damage than their non-damaged counterparts?
Seems like I’ve complicated the question quite a lot by comparing apple to apple, but I also learned a lot in the process.
Basics.ipynb (408.5 KB)
Click here to view the jupyter notebook file in a new tab
Hello @veratsien! Thanks for sharing. Your project is nice but a bit dry.
Here is some feedback:
- It’s not necessary to write the whole data dictionary but a brief introduction to the data is a good idea, also write the aim of the project
- Write conclusions. Probabily a person will read only the intro and conclusions (why should I read it and what I have to do)
- Explain better why you dropped all the cars outside the range of 1000-60000.
- Explain better your finding the the last part of the project.
Thank you for taking the time to go through my notebook. I just did, and honestly, it was a little painful.
And thank you for your feedback. They are exactly what this notebook could use – a better structure and better markdowns.
This was my first project and I knew nothing about what you’ve mentioned in your feedback. All I focused on was answering the questions in the project guide. Two months later, I’m glad to come back and look at this project and know that I’ve come a long way and learned a lot thanks to this awesome community!
Your feedback is exactly what I needed as a true beginner. Hopefully, more fellow learners will benefit from it. Thank you!
Maybe your project could use a bit more markdown explanation and other things, But I have learned a few things from your code. And I wish I could have learned more if there were some inline code comments.
For example this
autos.loc[autos['postal_code'] < 1e4, 'postal_code']
And I’ve learned how to remove all non Digit characters in one go from this
autos['odometer'] = autos['odometer'].str.replace('\D', '')
brand_model_combo = autos.groupby(['brand','model'])
For a beginner, you have done wonderful job with these, I guess, So your code to me is nothing but dry. It was a learning resource. And I’m sure, in 2 months time, you have learned quite a lot and would do a completely different analysis on this same dataset. Thank you for all the lessons.
I can’t tell you how much I appreciate your reply! It’s the best encouragement I’ve had since learning DS.
I’m so glad I can be of help in your journey of learning. I do plan on updating this notebook when I’m ready to tackle my portfolio, I will definitely add more code commenting.
For now, all I can say is, I opened the Pandas documentation page for this project and I still have it open 2 months later.
Btw, if you still need to learn some regex, here’s a website I found that’s really helpful in quickly getting the hang of it or just a refresher: https://regexone.com/
I can’t imagine what you can achieve if you can go through this messy notebook and learn something.
I’m glad you find it to be an encouraging comment. I hope you will soon be able to create a great portfolio. Looking forward to seeing more projects from you. And thanks for the link. It looks quite handy.
Happy learning and happy coding.