Guided project 4 - College Majors

Hi everyone. I finished my fourth project trying to turn it into a fully-fledged research. That is why I expanded a bit the initial framework of the project. Hope you guys may share some thoughts with me concerning the possible ways for further improvement.

Project 4 - What Determines the Choice of a College Major_.ipynb (686.7 KB)

Click here to view the jupyter notebook file in a new tab


That’s nice. Thanks for sharing.


hey @erzinrost

This is one cool project! :+1:

I especially liked the box-plot (haven’t checked the code part yet though).

The workflow is smooth and engaging and the conclusion very well thought out. Even if we want to analyze and criticize it we would need understanding of the complete project and the thought process you may have had. And that’s a compliment! :slight_smile:

Thank you so much for sharing.


Dear Rucha, many thanks for your comments. I was thinking a lot about the design of the project, and still a bit unsure if it is overloaded with graphs at the beginning (scatterplots and histograms). Sometimes it is a bit difficult to figure out the balance between the number of graphs and their relevance.

hi @erzinrost

Interesting point there! Well, I am no expert nor a Data Scientist. I am basing this purely on my current understanding and learning of the data science concepts and former experience.

You utilized a scatter plot with a trend line, a bar graph with stacked groups, and a box plot to show the distribution or 5-point summary.
This is relevant information being shown using relevant plots. :ok_hand:

Few things you may take into account when in doubts. Again I am no expert and this is student to student!

  • is the plot representing data in the most readable format? For example - pie-chart is considered good when we have about 4-5 categories. categories == 200, absolute Nope.

  • does the graph add visualization power to the analysis at hand? say if we want to identify a correlation between two variables and their movement with each other, a scatter plot works best here.

  • if we have already shown something as stacked bar-plot then maybe instead of repeating the same chart type, we show similar data in tabular form, thus maintaining a variability.
    The real deal here is identifying which data as a plot is most effective for a reader to keep engaged in the project. For example:

    • Stacked bar plot representing Age-Group vs (time spent on FB, Insta, LinkedIn, Whats’app) - as a reader this would engage me more as I will like to know which age-group people are FB addicts!
    • Babies picking different shapes and putting in the mouth :baby: They want to eat the whole world anyway! So we might have a uniform distribution here then the bar graph won’t make sense. A table with the data would do fine.
  • does the graph/plot takes away the attention from the overall analysis and becomes a hindrance?

I have bored you to snooze mode by now. :scream:

Summarizing, not all data can be represented as plots, and not necessarily a plot will always give us meaningful insights. The balance is not the count of plot and charts embedded in the project, the balance is associating the right plot with the right data at the right place in the project.

1 Like

Dear Rucha, many thanks for your points. I will keep them in mind for the future projects.