Project - Analysis of Data Science Content Stack Exchange

This solution is developed with a data acquisition strategy on 3 axes: 1. Web-scraping. 2. Stack Exchange API 3. SEDE database.

All axes are used in tandem. Frequently, it is demonstrated that instead of writing lengthy code, we can simply query the API or database.
Detailed analysis of data science content and development of relations between topics has been done using python ‘set’ operations, which simplify the analysis tremendously.

All kind of feedback is welcome and will be appreciated, especially on the 'algorithm': `indiscip` for classifying questions into various disciplines of data science .

Basics.ipynb (812.6 KB)

Click here to view the jupyter notebook file in a new tab


hey @saquibmehmood1

Thanks for sharing this project with DQ community!

The project has a great introduction and clean start. I would just like to suggest about the plots.

  • try to maintain consistency within a project. The bar plot with only edges visible seems very different as compared bar plots at the end section

  • you scaled the x-axis, but the tags names on y-axis are also not visible. May be avoiding all the labels and keeping every 5th/ 10th label or keeping only the top 20 tags and grouping rest of the tags in one group would give a readable plot.

  • try to use ylim when the bars are too small for the y-axis, and limiting the y-axis will give more readable bar plots. for example, which chart gives a quick and easy understanding of the data?