I’m sharing my first data visualization project for your review.
I’d like to receive feedback on how I have analyzed and interpreted the analysis ideas mentioned here. It’d be really appreciated if the reviewers also comment on the way I have interpreted all observations and share you thoughts if it could be put in a better way.
Looking forward to your valuable comments and feedback.
earnings-based-on-majors.ipynb (846.8 KB)
Click here to view the jupyter notebook file in a new tab
@april.g A gentle reminder on reviewing my project. Eagerly waiting for your feedback.
Hey Anjali, thanks for the nudge. I miss some of these at times, sorry about that!
Overall the project did a good job of creating the graphs and I agree with most of the conclusions you reached. The exception was from the hexbin plot where you said that almost 60% of the graduates were unemployed. I think the number is actually about 6% (0.06).
I think one of the toughest things about this project in general is interpreting and communicating what we’re seeing in the graphs in an effective way. For example, when we look at the histogram for the distribution of the ShareWomen column, the part emphasized was “29 out of 172 majors consists of about 68-77% Women.” When I see something like that, my brain focuses on the 29 out of 172 and thinks “Wow, not many women dominate many majors.” I have to give the bold part a second look to double back and say “Oh, that many majors have that percentage range of women.” (Isn’t that strange how that happens? ) I realized later that it was the tallest bar and that was why it was pointed it for us. But what if we instead looked at all the bars where women are in the majority (>50%)? It paints a different picture!
Anyhow, those are just some thoughts I had, take them with a grain of salt. Thanks again for sharing, Anjali.
Oops! What was I thinking when it is very clear 0.06 is 6%. Apologies about that one.
And about the ShareWomen column in the histogram, do you think my observation is wrong? When I had another look at the chart , I agree with what you mentioned here:
but still, do you think my observation that is emphasized is wrong? And from a data analyst perspective, should we go with the more obvious observation( like what the tallest bar points to) or to go with a broader observation like what you have mentioned(>50%).
Again, thanks for your insightful feedback
I think it depends on what you want to communicate to the reader about the observation. For me, it seemed like it was an odd statistic to emphasize. I might be curious about how many majors have more or less women, but not necessarily that exact percentage range. Does that make sense? I know that a lot of times the highest/lowest bars are more interesting to look at (like for median income). I think for this particular histogram, just due to the nature of the column itself, it’s not necessarily the tallest bar that is the most noteworthy part of it. I think the shape of the histogram itself is more interesting.
Or, if the tallest bar was what was most interesting to you about the graph, maybe find another way to write about it. Maybe there’s a way to express that differently so that the intention is clearer? Or maybe combine both approaches… you could say how many majors were predominantly women, and then say out of those 29 of them had that percentage range. Best of both worlds!
@april.g that was really helpful. Just to clarify again, in simple terms do you reckon it’d be correct to just say more than 50% of the 172 majors had majority of women students. Still learning how to effectively communicate observations from plots.
I think so. You could verify this by changing the number of bins for the histogram so it just shows the below 50% and above 50% I think.
Thanks a lot @april.g. That makes it crystal clear. You are awesome! I’ve made the edits and uploaded the new project file.