Guided Project No. 05: Visualizing the Gender Gap in College Degrees

Good day, everyone. I hope anyone who happens to read this is doing well. I’m sharing my output for the fifth guided project in the ‘Data Scientist in Python’ path.

I know that there isn’t going to be much feedback for this project since it is just generating one figure. I skipped a more in-depth analysis as my goal was really just to practice improving visualizations. All comments and suggestions are still welcome though.

gp05_visualizing_gender_gap_dq.ipynb (863.1 KB)
(30.3 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @philiplibre,

It’s not actually a feedback yet, just pointing out that a technical issue happened: the code cells in your project were not run, all with empty outputs for now. Please re-run the project and share it again.

By the way (and this is already a little bit of feedback :slightly_smiling_face:), please make markdown the last cell with the conclusion.

Thank you!

1 Like

Hi @Elena_Kosourova,

Yes, I noticed that because I used a more recent syntax for removing the ticks since I worked on the project using my local machine:

ax.tick_params(right=False, left=False, bottom=False, top=False)

instead of using this:

ax.tick_params(right='off', left='off', bottom='off', top='off')

I’m working on adjusting the code right now so Dataquest’s server can produce the graphs.

Edit: Just adjusted a few codes so it’s consistent with the version being used by the DQ server. This notebook should work now:

gp05_visualizing_gender_gap_dq.ipynb (863.1 KB)
(30.3 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @philiplibre,

Now that all the technical issues are resolved, I’m back with my review :blush:

Your project looks very nice, well-organized and with good markdown explanations. The subheadings are descriptive, all the links to the original data are present. Also, I didn’t even encounter any typos, which rarely happens with the projects :slightly_smiling_face: Congratulations on having done a good job!

Below are some suggestions for your consideration:

  • When you place a link, it’s better to make its text not very long. For example, in case of your 2nd link in the introduction, it’s enough to leave “raw data” as the text of the link, not the whole sentence.
  • A good practice is to use a uniform style of quote marks for the string data in the code cells throughout the project: or only single, or only double quote marks.
  • For this piece of code, repeated several times in your project:
ax.spines["right"].set_visible(**False** )
ax.spines["left"].set_visible(**False** )
ax.spines["top"].set_visible(**False** )
ax.spines["bottom"].set_visible(**False** )

you can consider using a for-loop.

  • I would recommend you to do all your guided projects on your local computer, with Anaconda with the latest version of Python, rather than on DQ. The DQ platform is now in process of converting all the missions into new Python, but it’s still ongoing. In this case, for example, for the piece of code below:
ax.tick_params(right='off', left='off', bottom='off', top='off')

you can use the new syntax: False instead of off. It’s always better to learn and apply the newest possible things.

  • For very long code lines, it’s always better to divide them into several code lines, to improve their readability. For example, instead of:
stem_cats = ['Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', 'Math and Statistics']

you can use:

stem_cats = ['Engineering', 'Computer Science', 
             'Psychology', 'Biology', 'Physical Sciences', 
             'Math and Statistics']

It’s an especially good idea for graphs, where you can put each argument on a new row. Like this:

ax.plot(
        women_degrees['Year'], 
        women_degrees[stem_cats[sp]], 
        c=cb_dark_blue, 
        label='Women', 
        linewidth=3
        )
  • The code cell [8]: you might think of one giant for-loop for all the 3 for-loops here (those for each column).
  • In general, you can also consider combining all the code cells from [8] to [12] into one unique cell. Practically it means using only the last of them (the code cell [12]), with all the modifications applied, with all the intermediate technical explanations (like setting x-axis, adding a horozontal line, etc.) added inside this giant cell as comments. I know it’s a project for learning and practicing all these things, and my own same project here is a disaster :joy: But it’s always a good idea to optimize your code as much as possible (and of course, I’m going to return to my project as well, and to introduce all these improvements). Especially if you’re planning to use this project in your portfolio.
  • In the conclusion section, it’s better to add more specific conclusions about the gender gap in various spheres (or categories of spheres).

I hope my feedback was helpful. Good luck with your future projects, and happy learning!

2 Likes