Guided Project_ Popular Data Science Questions

Hi everyone,

Here is my new project that I would like to share with you. It took me so much time to complete it (one month!) that I still cannot believe I’ve finally finished it :grinning:

Anyway, I really enjoyed exploring this database, analysing and visualizing the results, discovering a new python library for visualizing missing values - missingno. I’ve learned a lot while googling information about deep learning and understood why it’s so super-popular nowadays. I was so enthusiastic about it that I’ve already recommended this field to a friend of mine, as a suggestion to her son for his further university selection, after many years though! :sweat_smile:

As usual, any your feedback would be very valuable for me. What can be improved or modified in my project? Are there some issues or discrepancies? Probably, some shorter ways to write certain pieces of code? Well, I have also some doubts about cleaning up the dictionary in the code cell [21]. For now, since it was not particularly long, I did it practically manually for having more control on the data and preserving the relevant data as much as possible, according to the algorithm described in the subsequent markdown cell. I liked the result of this work, however still having doubts if such a perfectionism was reasonable in this situation.

Many thanks in advance!

Popular_Data_Science_Questions.ipynb (1.4 MB)

Click here to view the jupyter notebook file in a new tab

10 Likes

Hi @Elena_Kosourova

Thanks for sharing your project on Exploring Popular Data Science Questions. To be sincere, I have always gained a lot from most of your projects. For this, I enjoyed following through like all your steps , the code lines and the explanations you gave, all are well detailed and hope the same will reflects to any person who will have the opportunity to go through your work. Though haven’t reach this far, it’s kind of inspirational for what awaits me, that is, after spending like a hour reading through this project. Have no much to say and just congratulating you for the good work, keep it up buddy!

2 Likes

Hi Brayan,

Thanks a lot for your kind and encouraging words! I’m happy that my work was appreciated and my project was even useful for you, and, hopefully, will be also for other learners!

1 Like

Dear @Elena_Kosourova,

Very awesome job. I was working on this project just yesterday and was starting to get frustrated by the end of the day. Waking up this morning and finding the link to your project in this week’s DQ Download issue was a pleasant surprise.

I’m only confused about one thing: in the last line graph of the project, how did you decide to set the horizontal line at height 34? It seems like you hardcoded that number and I’m not sure where it comes from. More in general, what do you mean exactly when you say

The percentage of deep learning questions was also constantly growing, up until middle of 2018, when it reached a plateau of 34%, which is still continuing, with a slight trend of growing.

What’s the exact meaning of the word plateau in this context?

Thanks a lot for your great work.

1 Like

Hi @gbpignatti5,

Thanks a lot, I’m very glad to know that my project was helpful! :star_struck:

About the horizontal line in the last graph - well, I actually just tried a few suitable integers (33, 34, and 35) to fit the last part of the graph, and 34 seemed to be the most appropriate. Because we see that the last part of the graph (approximately from the middle of 2018 till the end of 2020) is different: before the graph was constantly increasing, and then after the middle of 2018 it’s levelled-off, and goes almost in parallel with the x-axis. Of course, there are still some fluctuations even in this “tail” of the plot, but the overall trend is rather horizontal-like. This is what I mean by the plot having reached a plateau and become mostly flat.

If we want to check the value of 34% anyway, we can take only the data for that part of the graph, try to find the average percentage of DL related questions, and then we’ll round the resulting value to the integer (because we don’t want to have in our report numbers like 33,8176%, but neither 33.8% :grinning:). And this value will be exactly 34.

1 Like

Ok, I see. It makes very much sense. Thanks for helping me understand.

1 Like

Honestly, for someone still doing Python Fundamentals course, guided projects like this look overwhelming at first glance. I can’t help but think: am I really going to get to the point where I can not only understand all that code and those libraries, but use them and write similar code myself? :sweat_smile:

The constant flow between regular language and python code is so neat. Gotta love Jupyter Notebook.

1 Like

Hi @zico333,

Don’t worry, it’s a normal thought at the beginning of any path. I was also almost scared to look at other people’s guided projects at the beginning, that is half a year ago :scream_cat: :joy: When I look back to those times, I see that I’ve learned really a lot of things since then! And so happens to everyone, I think :blush:

WIth support from the dataquest community. Here I am posting my first project on Popular Data Science questions. Thanks @Elena_Kosourova for directing me to the right forum.
Guided Project - Popular Data Science Questions.ipynb (381.3 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @ashwin86rajan,

Thanks for sharing your work with the Community! Next time, please, could you create a new topic here in the “Share” branch? Then the topic will be dedicated only to your project, which is very convenient :relaxed:

Anyway, here is my feedback:

  • It’s better to avoid copying the instructions directly from DQ into your project. Instead, consider re-writing and re-phrasing them in other words (if necessary at all). This will help to make your project more personalized and hence more attractive for potential employers in future, if you decide to include it into your portfolio.
  • Project structure: please include project title, dataset link, project goal, conclusion, remove the empty code cell at the end of the project.
  • It’s better to re-run the already ready project to have all the code cells in order and starting from 1.
  • Code comments. Don’t make them too long, too wordy, obvious (like # Read in the file into a dataframe), or multi-line. In general, try to keep them as concise as possible. Once again, don’t copy instructions there.
  • A good idea is to combine adjacent code cells without any outputs or markdown explanations between them into one (e.g. [320]+[321]).
  • Visualizations. You should always add a plot title and make it (as well as axis labels) of sufficient fontsize for better readability. Also, you might consider despining your plots and removing the legend when it’s redundant.
  • When mentioning column names or variables in markdown, it’s better to surround them with backticks for emphasizing.
  • The code cell [325]: the output here is a bit unclear, at least to me.
  • The code cell [318]: here you can use sorting.

Hope my suggestions were useful. Good luck with your future projects!

1 Like

Code cell [325] an empty list ‘all_tags’ was to be removed.