Here is my 3rd project on Dataquest. I found it quite challenging, but also enjoyed it a lot. The most fascinatig thing for me in this dataset was that there are so many cars free of charge on German Ebay
Please take a look at my project and give me your feedback / suggestions. What can be improved in it in terms of code, reasonings or the overall style?
Congratulations on finishing a bit tougher guided project and also for becoming one of the community champions of this week.
I have gone through your project and there are a few things that comes to my mind.
Initially, you have printed the full autos df which demands me to scroll through the whole dataset. As a reader, I might not be interested in scrolling through the whole dataset. You could have used df.head() and df.tail() instead.
I have seen this code before. This can be done differently too.
res = [str.lower()]
for c in str[1:]:
if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
# Converting the column names
new_columns = 
for c in autos.columns:
column_snake = camel_to_snake(c)
Could you please tell me where did you get this idea? I have seen others also following this code on other guided projects while reviewing it. Now I am curious about the source of inspiration for this line of code.
Since you are using this method to change the column names, in fact you can actually type the required column name with snake case straightaway. But I do agree that it will be a good thing to know how to do it with a function, especially when we have many names to change.
About the price outliers, I agree that the higher values can be too high and are outliers. But do you think there will be free cars ? Maybe on ebay it starts with $1?
Rest everything looks great. Probably conclusion could have been bullet points for easier reading.
There are many thing I quite enjoyed and learned.
I quite like how you have come to 1927 as the starting year for the registration_year column. That was a cool logic.
Also, it is nice that you took time to do the extra instructions such as mapping the German to Engilsh, finding the damaged car prices and so on.
Thank you for your positive and thorough feedback! And congartulations also to you for being among the champions this week!
About printing the whole dataframe I competely agree with you, it was absolutely redundant. But it was just one of the tasks on the first screen of that project, and I did it without ubderstanding why However, right now that I am doing my 4th guided project, the one about visualizations, I found out that there are some misleading tasks there (have you probably noticed them as well?). So this time I am not following so blindly the project instructions, using rather a selective approach of what to do and what to skip.
As for the function converting camelcase to snake case, I found it on Geeks for Geeks. While searching, I saw also many other similar functions for that purpose, but this one seemed to me the most elegant. In case of given project it would be perfectly ok to use dataframe.rename() to rename all the columns, but I was curious about a more universal way to convert these cases, which as you said would be useful in case of many columns to rename.
Good idea about emphasizing the conclusions with bullet points, thank you!
About the minimum price of 0$, I was very curious as well. I even googled the information about German Ebay, and if it’s possible, in general, to have free cars on Ebay. And to my big surprise I found a couple of forums where people were discussing (without any reference to data analysis at all) exacty this thing, that Germans offer free cars on their Ebay! Well, then I was more convinced, even though still surprised, especially about the amount of free cars, and so I used 0 as the lower limit.
Many thanks again for your attention to my project and for your thoughtful comments. Happy coding and keep in touch!
I didn’t know about the free cars on ebay. I approached it with a pre-convinced idea and didn’t even bother to think about such a possibility. Especially when there was a lot of zero values I should have looked a bit deeper into it.
Looks like you can share a lot of wisdom from this project. So would you care to have a look at mine and see if you can find any other such wrong assumptions?
Here is the link.
Like you have mentioned, it is better to go on our own directions keeping the instruction as a guidance. Following exactly is mostly limiting, at times. Anyway hope to see you soon on the next guided project. I am also on it. Hopefully finish it soon. Thanks a lot for your detailed review. I will also check out geeks for geeks to check that snake case solution.
It’s a pleasure for me to review both this your project and the next ones. It’s always a win-win situation to exchange ideas, insights and ways of thinking with other people, especially with enthusiastic, curious and genuinely involved in what they are doing. I will look at your project and will write my feedback there.
About Geeks for Geeks yes, I highly recommend it to you, there are a lot of useful materials for data scientists, as well as on StackOverflow. I use a lot these 2 resources. they are really helpful.
As for the project on data visualizations, I’m a little bit stuck on it. At the beginning it seemed so easy, only 6 screens! But it turned out to be more complicated and challenging, which is just a positive thing for our growth.