# Can someone answer this questions in 2nd chapter of Guided Project: Visualizing Earnings Based On College Majors

Hi,

I am confused while answering these questions below from the 146-2: Guided Project: Visualizing Earnings Based On College Majors

Use the plots to explore the following questions:

• Do students in more popular majors make more money?
• Do students that majored in subjects that were majority female make more money?
• Is there any link between the number of full-time employees and median salary?

thank you,

Handy

Hey, Handy.

In the spirit of this project, I suggest answering these questions by looking at appropriate scatter plots. If you need help recalling how to read and generally work with scatter plots, please review this mission.

1. Do students in more popular majors make more money?

We don’t have a “popularity” column. I suggest that, as an approximation for this, you use the column `Total`. You can plot this against `Median` (the median salary of full-time, year-round workers.).

``````recent_grads.plot(x='Total', y='Median', kind='scatter')
``````

Look at the resulting graph. Is there a concentration of points on the top-right quadrant (where the popular and well-paying majors are located or not)? Is there any concentration anywhere or are they relatively scattered?

2. Do students that majored in subjects that were majorly female make more money?

Similar to the above, plot `ShareWomen` against `Median`.

3. Similar to the above, plot `Full_time` against `Median`.

4 Likes

perhaps change to recent_grads.plot(x=‘Total’, y=‘Median’, kind=‘scatter’)
otherwise the answers and explanation are bang-on. Thanks

Thank you very much for spotting and reporting that typo! I fixed it

Hello sir and thanks for the helping points!

Regarding the third question I have plotted Full_time against Median and I have spotted that most data points are gathered around Full_time (0,25000) and Median (20000,40000).

However I cannot understand what kind of information those results provide.

The only thing I can point out is that there is great variability of median incomes spanning from 20.000 to 80000 and one outlier of 1200000 as full_time approaches 0.

I enclose the scatter plot as well so you can have an overview as well

1 Like

Judging by the graph, I would say the results indicate that there’s no significant relation between the number of full time employees and the median salary.

Could you elaborate a bit more about how you came up with that conclusion.
What is the thing that indicates that there is no relation between the two variables?

It’s simply the fact that nothings stands out, there is no linear variation between the variables. When `Median` grows, noting really happens to the number of full time employees, for instance.

This can be verified (or maybe indicate that my visual assessment is wrong) with the Pearson correlation coefficient (if you’ve learned about it already).

1 Like

Thank you very much for the insight Bruno!

1 Like

@Bruno I would recommend modifying the Instructions for this part of the Guided Project.

The instructions ask us to explore certain relationships using scatter plots. And then pose us questions stating -

`Use **the plots** to explore the following questions`

That `the plots` makes it seem that we are supposed to look at the plots we already created.

But two of the questions don’t correspond to variable/features being plotted. And it becomes slightly confusing as to whether or not we (students) can answer those questions with just those plots. Creating new plots to answer those questions might seem obvious to some students, but not all.

My general recommendations to help everyone learn through this better would be -

1. Either add those features in the instructions for creating the scatter plots so that the questions can be answered.

2. Or, create two separate set of questions. One set is relevant for the already created scatter plots. One set asks students to create plots to answer questions by identifying the necessary features themselves.

3. Or, instead of first asking to create specific plots and then answering questions. Just post the questions, mention what features can help answer those questions and ask students to use scatter plots to answer those.

The 2nd one is the best option as per me.

There are other improvements I could maybe suggest, but don’t wish to trouble you

11 Likes

That sure was confusing . From the step by step it did looked like we have to answer these question from the steps/plots created.

Thank you for this.

Feedback is welcomed!! If you have more to give, please do!

One way to do it is by writing to [email protected].

Thanks again.

I agree with the_doctor, this instructions make it seem like the second bullet point is asking you to explore the questions using the scatter plots generated from the previous bullet point. The wording is confusing.

The wording is not tricky,they straight up ask us to answer questions using the plots produced, which appear not all that relevant.

Could someone from Dataquest please acknowledge our observations and give us some clarification/revision please?

If we ARE indeed supposed to be able to answer these questions using the scatter plots generated from the first bullet point, could we please receive more instruction/hints on how to infer these answers from them? Otherwise please reword the instructions within the the exercise.

Thanks!

The question is: Is there any link between the number of full-time employees and median salary?

Since each data point is a also unique major, couldn’t you also infer that majors with more full time employees, which also happens to be the more popular majors (more total students) don’t necessarily have a higher salary than majors with less full time employees?

I kinda thought those questions were intentionally asked. As Dataquest instructed us to make listed scatter plots and asked unrelated questions to see if we know those questions can’t be answered by the plots we just made.

It was confusing for me at first to look at those plots as I don’t see any relation/ trend/ dependence. The only thing I can say is the data is skewed to the left when the quantity in x-axis is small. Just like when we conduct survey on a small sample size, the collected answers won’t tell the truth story. As the sample become bigger, then the data is more reliable.

Hi Bruno,
I have trouble looking at the graph and seeing how you reach this conclusion - 'in the top-right quadrant (where the popular and well-paying majors are located )

I see that Total number of people with major, increases along the x axis so I see that is rising popularity for what the major is but do not see any increase along the y axis as x increases so I do not see where the ‘well-paying majors are located’ comes from . Please advise?

Would it be correct to say that there appears to be a weak negative correlation between major popularity and earnings? Since the majority of the medians seem clustered in the 30-50K range with median range decreasing and staying within this narrow range as values increase along the x-axis.

Hi Pristakos,

Shouldn’t the median be on the y axis as it is the dependent variable and depends on full_time - the independent variable?