Solution Notebook - Popular Data Science Questions

Guided Project Link: https://app.dataquest.io/m/469/guided-project%3A-popular-data-science-questions/11/next-steps

Solution Notebook: https://github.com/dataquestio/solutions/blob/master/Mission469Solutions.ipynb

not as sophisticated as the solution but…

1 Like

Nice project to practice. Moving on to Probability and Statistics :slight_smile:

1 Like

Hi there,
I was up to share my notebook but after watching the solution I think it has not much sense, since the solution is quite complete.

However I realized that my analysis regarding Views vs. Uses comparison ends different.

The questions here is: Which of both will have higher priority to determine that the tag makes the question more popular or less?

We will like to see both columns one against the other. The first step would be to normalize the values for each column so we don’t have to keep our attention on the net values but rather the percent values.

Since there are too many tags to be shown in a simple plot, we apply the following 2 filters as the second step:

Only the upper 85% of the list "Used"
Only the upper 85% of the list "Views"

If we can normalize the values for each column in order to compare them easily, then a scatter plot will show the relevance for each Tag with respect to each axis:

Views_vs_Uses

For each tag in this representation, the farest the tag from both axes is, the more important the tag. From this plot, we can see that machine-learning caugh the top-interest in the questions if we ignore “python”. If we exclude “python” and the libraries “keras”, “scikit-learn” and “tensorflow”, then the most important tags to follow up are:

machine-learning
deep-learning
neural-network
classification

This mission took me more time than expected, but was quite interesting mission :wink:

3 Likes

My notebook is way simpler than the solution notebook. It would be nice to know how to put that line graph over a bar graph! Anyway, this is what I came up with.

2 Likes