Hi DQ community,
I spent some days to improve my ML Skills by entering in Kaggle’s House Price Predictions competition (it may be familiar to you from DQ’s “Machine Learning Introduction”, course 4). Kaggle states that:
“This is a perfect competition for data science students who have completed an online course in machine learning and are looking to expand their skill set before trying a featured competition.”
I started with a score of 0.66052. They use RMSLE to compute the score. Then I improved the score down to 0.35915 (the lower, the better). This seems to be quite an improvement but, believe me, is not that much. I’m currently in the 4640th position out of 5287 (87,7%).
I thought is a good opportunity to create a small team to reach a better position and improve the required skills.
I analyzed the score Leader Board and found the following:
- 5287 competitors submitted a total of 15145 times
- To be in the top 10%, your Score should be lower than 0.143220
- To be in the top 100 list, your Score should be lower than 0.105860
This is the score distribution:
The same image as above zoomed in:
There is a huge difference between the top 10% and the top 100 positions! Those in the top 100 positions are quite few compared to the rest. This is the distribution of the scores below 0.8:
I would like to continue improving the model until reaching a top 10%. If someone has interest in participating, please let me know!
I haven’t gotten to the ML stuff yet but would love to tag along just to listen to discussion n process… If that’s ok… Have u received much interest?
Thanks for the analysis
Hey @fedepereira impressive job analysing the results of an analytics competitions! You’re a Data Analyst at heart, huh!?
What’s the eligibility criteria to join with you on this?
I have completed the Linear Regression part from a course on Udemy and have done KNN part on DQ. These are the only two ML’s I know yet. Also, my knowledge may be limited to what I have learned and was able to successfully grasp!
I am currently on the ML step in Data Science with python track.
You are the first one answering
In this competition, is not all ML stuff. There is also much to do in data analysis such as feature labeling and engineering. I’ll wait and see how much people have interest in participating and then organize a team.
I don’t have any. It’s only a matter of interest and will. I think your knowledge is enough to enter a competence like this. There is data analysis (for numerical and categorical features) and machine model testing to do.
Thank you @nityesh!
I think data analysis flows through my blood, you’re right
Love the charts! Ironically, I have been working somewhat obsessively on this competition for weeks, putting in my 10 submissions a day. After over 300 submissions my best score is in the top 2%. I would be interested in joining a team to figure out how to progress from here. All out of ideas unfortunately. I will try and post a notebook here in the next couple of days. Want to clean it up first
you are already in the top 2%. Whatever ideas we will think you would be like - been there done that!
WOW! Thank you for joining us. We’ll make a great team!
We can start this week and go further. I’ll wait for your notebook if you have the time.
Regarding my work, I divided it into 3 notebooks: The 1st is dedicated to numerical features, 2nd to categorical features (ordinal + nominal) and the 3rd is about the machine learning process/validation/submission.
One advantage of making like this is that each notebook is independent of the others because 1st and 2nd create new CSV files that are input to the 3rd.
For example, I generate many files for categoricals and test all the options in the 3rd to see which is performing better.
I read that some of the competitors mentioned that once that you have, let’s say, 5 models with relatively good scores, you would like to “blend” them to have a better score. This will be done with, for example, weighted arithmetic mean. Check this post from Kaggle. I think you are in a similar position.
Ok I think I have posted this correctly to Github and it is public. https://github.com/cablue01/Kaggle-House-Prices-Competition/blob/master/Kaggle_House_Prices_Advanced_Regression_Techniques.ipynb I am not really comfortable with Github yet. Is there another place on this forum I should post this? Any constructive feedback is appreciated. I tried to add some interesting and helpful comments, but mostly just documenting the process.
Hey @cablue01, this looks great!
I suggest that you create a new topic in the Share category, introduce us to your work and upload the notebook file over there. The platform will automatically append a one-click viewable link at the end of the post.
I see that the blending technique is already familiar to you.
I will let you know if we reach your level and have some ideas on how to improve the score.
Thanks a lot! I created a topic in the share category.
Thanks! I’ll keep plugging away and will share anything I can come up with.
Thank you for sharing your work!!
Hey @Rucha and @spader108,
do you have a Kaggle account? I can invite you to a team in Kaggle if you wish.
My work is in this gitlab repo.
Please take a look at the readme file.
The 1st part generates a CSV file with the numerical features while the 2nd part does the same for categorical features. The 3rd part is for modeling and testing. I think not only the 3rd part but also 1st and 2nd need improvements. A first step would be to analyze and propose ideas to have better performance. Do you agree?
Please let me know in what part(s) of the project you feel comfortable working with and let’s organize how to proceed. Let’s discuss some ideas and test some modeling!
There are many sources to get motivation from in the Kaggle’s public notebooks and also the work that cablue01 shared with us.
I got a skills of Data Analysis and have basic knowledge of ML. I am passionate and genuine in improving my skills. Let me know if this works then i’d love to be part of this journey.
BTW I’ve also have a kaggle account
Of course! The idea is to merge effort and improve our skills.
Awesome!! Let me know how shall we proceed further!
Sounds good @fedepereira … I’ll look it all over
I have kaggle acct but whatever is easier
Do you want to continue using this forum thread to communicate or move to discord or something else?