Predicting House Sale Prices - the right way

Hi all!
Upload my new edition of Ames Price project.
Best Regards, Vadim Maklakov
Full version with dataset for download - herePredicting_House_Sale_Prices_The_right_way.ipynb (1.2 MB)

Click here to view the jupyter notebook file in a new tab

2 Likes

Hi Vadim!
This is absolutely amazing! I went through the work and found it fantastic! How long did it take you to come up with this?
Super fluent, and compact! Many congratulations on doing it

One thing that I notice, do you think that should there be some more information added in markdowns and in docstrings, so that it’s easier for someone to pick up the hard work who isn’t too good at understanding the code?
My point might be unnecessary at all, but I think of it as a lot of readers could really appreciate the ‘flow of work’, code, and the art IF they are able to understand it fluently.

And again, I congratulate you on this amazing piece of work. And a lot of thanks for notifying me. It is surely a great source of learning and executing the stuff I am learning from the books you recommended.

My best regards,
Ali

2 Likes

Ali, h!
Thank you very much for yours high mark of my for my humble work. :grinning:
according to labor costs in the area of 30 clear full hours of working time. Moreover, about the 1/3 went to writing and debugging functions for preliminary data analysis and cross validation. About a 1/3 on understanding the work of pipelines and about a 1/8 I tried to put a scale of values on the logarithm of the values of the Y axis, but it did not work out and remain for Y axis standard linear scale.
The rest is coding itself.

Regarding the comments, it seems to me that it is necessary to add a brief description of the principle of operation of functions and parameters to the docstring, after which it becomes clear to the reader of the code what and where it comes from. Commenting on each line of code seems redundant to me and requires a lot of time

1 Like

you are most welcome Vadim!
The accuracy you scored is the highest by anyone on this platform and this fact itself is enough that you are doing fantastically well.

I don’t understand how to document code and write descriptions. What I get to know are two opinions. One is that this should be done. Extensive commentary and adding description helps the reader while another one is that of yours. So I think that is subjective.

Personally, I hate to write descriptions and comments but I practiced it (to comply with the art of ‘storytelling’)…

1 Like

short comments are needed primarily for yourself, so that in six months or a years later to remind yourself what you wrote here. All software development is related to writing documentation to one degree or another

2 Likes

I see that!
I don’t know when did i develop this tendency to attempt to write a super readable OR lots of descriptions with my code.

I am definitely going to work with this. Because it eats up time and energy

in fact, when solving any task, it is necessary to think over the sequence of actions, to think over how to name variables correctly - after that, the task of documenting and commenting is greatly simplified and reduced

Hello Vadim. I am new here. I would like to reach a stage to do this myself. What route do you recommend? Which courses ( and in what sequence) should i take?

Hi, sengupta.j!
It all depends on your level of mathematical education. Here, the program is designed for most graduates of New York High school and gives very primitive statistical skills and ML/DL, which will then be very difficult for you to get used to - I judge from my experience in Russia, since when describing most tasks, ML is required not only with stupid linear regression, but also NLP, DL is required, and personally I would not advise you to buy courses here. Here you can see only an approximate plan of the passage . Before you start, think carefully, because the first time will be very, very, very difficult, statistics say that 7% of students arrive before graduation. You will need to learn a lot on your own If you have the iron will and have the time and opportunity to study the action plan yourself:

  1. Learn Python-because it is in greater demand than R on the market. To do this, you need to go through the whole Python yourself in the book Learn Python 5e 2013 by Mark Lutz - you will learn data types, functions, classes, exception objects -this is the basis that you must have to. No need to write notes, solve all the examples in the Python console or in the Spyder IDE interactive console for Python

  2. Buy or download (almost all of them can be found on the Internet) these books and start from simple to complex. And in general, I recommend https://machinelearningmastery.com / is a great site for DA/DS beginners. How to get through classes and objects Start with Python to learn statistics and linear algebra in Jason Brownlee’s books. Free Book Libraries https://z-lib.org / and http://libgen.li / (library Genesis). Jason Brownlee provides links to the necessary literature in his articles, and believe me, it’s worth it. Here you will repent, as a blind puppy will tell you in the module about statistical assumptions or conditional probability and half a century ago hotly about the disputes about the attraction behind the ears of these P-values.

  3. You must hale get the base knowledge and mastering the following packages at the basic level of numpy, pandas, matplotib, seaborn scipy - for manipulating and preparing data - cleaning, evaluating, analyzing, combining, etc.

  4. Be sure to study SQL sites of the appropriate type https://www.postgresqltutorial.com /. first, master simple queries. Install Postgresql DB to yours local PC. Recommended books 1) The Applied SQL Data Analytics Workshop Second Edition 2020 by Upom Malik, Matt Goldwasser, and Benjamin Johnston. 2) PRACTICAL SQL. Copyright 2018 by Anthony Debarros. For advanced practice 3) Pro Oracle SQL Development: Best practices for writing extended queries from Jon Heller 2019. Install yourself on a Postgresql PC - it has many similar commands with Oracle…You must have write fast and complicated optimizing queries for extract data from database -DBA, transactions , insert, modification data - you should just have a general idea. The main focus is on the SELECT statement and everything related to it since the same query can have a 20 times in execution query speed.

  5. After you learn all about points 1-4, you can move on to machine learning. Here, in addition to the books by Jason Brownlee, I recommend Practical machine learning using Scikit-Learn, Keras and the second edition of TensorFlow Aurélien Geron 2019…

This is basically everything, get ready for what will be very difficult. I don’t recommend any courses anywhere - you throw yours money on the wind. All of the above is my personal opinion. From the point of view of time, minimum 1-2 years are very tight and you become only Junior DA/DS who who will not immediately find a job yet.

Dataset for practice look for kaggle and other public repos

3 Likes