Problem - Visualize Earning based on Major

Hi,

I am facing problem in following project and not able to make inferences:
https://app.dataquest.io/m/146/guided-project%3A-visualizing-earnings-based-on-college-majors/3/pandas-histograms

Questions
Use the plots to explore the following questions:

  • What percent of majors are predominantly male? Predominantly female?
  • What’s the most common median salary range?

I have got following plots:





Please help me in understanding the solution to the problems.

Regards,
Jimmy

Think about which column in the dataset tells you about the percentage of men or women for a particular major. If you plot that column as a histogram, what pattern do you see?

When you plot the median salary range, what pattern do you see?

Hello,
I get that second one’s answer is 30000 approx. (kindly tell if it’s wrong) but still I don’t get how to go for the first one (What percent of majors are predominantly male? Predominantly female?).

The questions I posted above are still valid -

Think about which column in the dataset tells you about the percentage of men or women for a particular major. If you plot that column as a histogram, what pattern do you see?

Which column in the dataset tells you about the percentage of men or women corresponding to a particular major?

Take your time. Think about it. And then we can continue the discussion based on that.

Kindly explain this once I am confused in your question itself.
What percent of majors are predominantly male? Predominantly female?

part.a.) means what percent of males have same major?
part.b.) means what percent of females have same major?

Is it so?

No, that’s not it.

If I say that Major A has 100 students. And that out of those 100 students, 75 students are Male. That means 25 students are Female.

Now, that means, that Major A has predominantly male students. Because the percentage of male students is 75% which is higher than the percentage of female students, which is 25%.

So, the question is asking you, that given all the Majors, what percentage of majors have predominantly male students. Break that down -

  • Predominantly male or female means, that one is higher than the other. What percentage would indicate that percentage of male students in a major is higher than the percentage of female students? What would that cut-off percentage be?
  • How many majors, out of all the majors, have predominantly male students?
  • Knowing the above, what percentage of the total number of majors would have predominantly male students?

Try to answer the above questions here.

1 Like

Hello,
kindly tell this once that will this query be solved by a histogram? or doing some mathematics with data?

What I think is that it is not possible to find the solution using Histogram.
The dataset contains:

  • Total number of people with major
  • Men - Male graduates
  • Women - Female graduates

This is what my finding is:
image

Please correct me. @the_doctor

This is what my finding is regarding the most median salary range:
image

Please correct me if I am wrong. @the_doctor

@the_doctor waiting for your inputs.
@dqoperations

Thanks for the reminder.

It is possible to find the solution using the Histogram to a certain extent.

What information does the ShareWomen column provide us with? Plot the histogram for that. Based on that histogram try to see what information you can gather from it. What does the x-axis tell you?

That would be correct. But since you are asked for a range, it would be the range corresponding to that bin. Even then, play around with the bin sizes and ranges to see if you can spot any other pattern.

I am not happy with answer @the_doctor because your is a question to my question. If you had have shown some example to clarify the problem, it would have solved the problem and cleared the doubt as well.

If you are not ready to engage in a discussion which helps you learn through the process instead of directly providing you with the answer, then that’s completely fine by me.

Answering those questions would have led you to the answer or helped develop a thought process which is important over here as per me. Because I have seen multiple students, across different education platforms, struggle with similar issues on figuring out how to think through the solution. Otherwise, we just get into the habit of expecting answers from others. I try to avoid doing that as much as possible.

Since you want an answer, sure -

The First Step of the Guided Project points out what ShareWomen is.

ShareWomen - Women as share of total.

That is essentially what you were trying to calculate, except that column already does it for every Major.

If you plotted that histogram then you would get something like -

image

From the Histogram you can look at the x-axis, and see the number of bins that are higher than 0.50.

The y-axis corresponds to how many values fall within a specific bin. Since ShareWomen corresponds to every Major, the y-axis essentially tells us the number of Majors corresponding to a particular range of value for ShareWomen.

Since ShareWomen is a percentage (x-axis), bins which lie after the 0.50 (50%) threshold would be the ones which are predominantly Women.

You can then see their corresponding y-axis values, and roughly calculate the percentage of Majors that are predominantly female.

To calculate the same for Majors that are predominantly male, you can look at the bins that are before the 0.50 value.

While the histogram itself wouldn’t give you the exact percentages, you can get a reasonable idea from it about the percentages. That’s the idea behind this.

You can, alternatively, use the dataframe itself to calculate these percentages too. That’s also fine and the best approach to get the exact answer as well. But the idea here is to help you get comfortable with analyzing a particular histogram or using a histogram to try and answer a variety of questions, or analyze the data in some way and gathering insights. But yes, you were right in saying this won’t get you the exact solution in terms of percentage values. You can provide feedback to DataQuest if this confused you or caused problems with you progressing further.

The questions I ask are meant to lead you through all the above steps.

I hope the above helps you. If you have any further questions, feel free to ask a new question, or you can tag someone else to help you out. Good luck and have a good day!

1 Like

Sir @the_doctor, you can see how old is the post and I am struggling to understand it. I think delaying learning is digging in vague or leaving learning. I am sorry if you understood it wrong.