If you are not ready to engage in a discussion which helps you learn through the process instead of directly providing you with the answer, then that’s completely fine by me.
Answering those questions would have led you to the answer or helped develop a thought process which is important over here as per me. Because I have seen multiple students, across different education platforms, struggle with similar issues on figuring out how to think through the solution. Otherwise, we just get into the habit of expecting answers from others. I try to avoid doing that as much as possible.
Since you want an answer, sure -
The First Step of the Guided Project points out what
ShareWomen - Women as share of total.
That is essentially what you were trying to calculate, except that column already does it for every Major.
If you plotted that histogram then you would get something like -
From the Histogram you can look at the x-axis, and see the number of bins that are higher than 0.50.
The y-axis corresponds to how many values fall within a specific bin. Since
ShareWomen corresponds to every
Major, the y-axis essentially tells us the number of
Majors corresponding to a particular range of value for
ShareWomen is a percentage (x-axis), bins which lie after the 0.50 (50%) threshold would be the ones which are predominantly Women.
You can then see their corresponding y-axis values, and roughly calculate the percentage of
Majors that are predominantly female.
To calculate the same for
Majors that are predominantly male, you can look at the bins that are before the 0.50 value.
While the histogram itself wouldn’t give you the exact percentages, you can get a reasonable idea from it about the percentages. That’s the idea behind this.
You can, alternatively, use the dataframe itself to calculate these percentages too. That’s also fine and the best approach to get the exact answer as well. But the idea here is to help you get comfortable with analyzing a particular histogram or using a histogram to try and answer a variety of questions, or analyze the data in some way and gathering insights. But yes, you were right in saying this won’t get you the exact solution in terms of percentage values. You can provide feedback to DataQuest if this confused you or caused problems with you progressing further.
The questions I ask are meant to lead you through all the above steps.
I hope the above helps you. If you have any further questions, feel free to ask a new question, or you can tag someone else to help you out. Good luck and have a good day!