Review Guided Project: Visualizing Earnings Based On College Majors

What more can be added?
What can be improved?
What more conclusions can be drawn?

project Visualizing Earnings Based On College Majors

Earnings_Based_On_College_Majors.ipynb (1.2 MB)

had problems saving on dataquest platform, worked on my local machine and uploaded my final work.

Click here to view the jupyter notebook file in a new tab


Hello @eashwary! Thanks for sharing:)

Here is what I liked:

  • You have clear project’s objections
  • You have sections that logically split your project
  • Some of your plots have a title and axes labels
  • You have interesting observations and zoom-ins in some cases to further investigate a relationship between two variables (like Total vs. Median) still this observations may be described better.

Here is what can be improved:

  • Add more subsections: like when you investigate a particular relationship.
  • Define a clearer project’s style (like in some cases you have Observation and in others Observations or OBSERVATION.
  • Give all your plots a meaningful title and axes labels. You can also give axes ticks (for dollars, it’s usually an y axes) a dollar symbol (for the top tick).
  • In which currency is your money? You just provide numbers without any symbol.
  • Comment your code! What does a particular code cell does? It’s rather easy to understand when your code is easy like in this project but it will become a useful habit once your code becomes more complex.
  • In some scatter plots your points are truncated due to tight limits you’ve set.
  • Remove grids from your histograms, they do not provide any information and just distract readers.
  • Your histograms can have more meaningful titles: like what does Median mean? Median salary, number of emplyees or something else? I can decipher this but it would have been easier for me and other reader (and for the future you) if the plot had a clearer title.
  • Here is an example of a plot with no labels and title: the plot under The ShareWomen column contained the percentages , woman as share of total. We can try splitting into 2 bins and observe.. What does it represent? Where are the percentages? What does y label mean? It took me around a minute to figure out what this plot means and how it’s linked to the observations. Most employers or stakeholder wouldn’t lose time trying to find out what you wanted to say:)
  • At the end you have a series of plots with no observations at all!
  • Write a conclusion. Did you answer the questions? What interesting data have you found? Wrap it all up. Most end users are only interested in the conclusions.

Hope it helped! Happy coding:)


Hi @eashwary,

I see you posted this project 4 months ago. Hopefully, my comments will be still valuable for you :grinning:

  • As @artur.sannikov96 said, it’s better to add more structure to the project by using subheadings and, of course, by adding a conclusion section.
  • More comments and conclusions about the plots, especially starting from the histogram section and further.
  • I would remove the code cell [21], since it doesn’t convey necessary information and occupies too much space.
  • Probably too many Total vs. Median plots, and also many of them without observations.
  • The code cell [15] - why starting from -2000? Better at least from 0.
  • The markdown cell after the code cell [21] - yes, in our dataset popularity=‘Total’.
  • For some plots (e.g. code cells [15], [18], [26]) too big zoom, and it’s better also to write some explanation about why we want to have a look exactly at that part.

Hope my feedback was useful, even though a little bit late :blush:

1 Like