In the spirit of this project, I suggest answering these questions by looking at appropriate scatter plots. If you need help recalling how to read and generally work with scatter plots, please review this mission.

Do students in more popular majors make more money?

We donât have a âpopularityâ column. I suggest that, as an approximation for this, you use the column Total. You can plot this against Median (the median salary of full-time, year-round workers.).

Look at the resulting graph. Is there a concentration of points on the top-right quadrant (where the popular and well-paying majors are located or not)? Is there any concentration anywhere or are they relatively scattered?

Do students that majored in subjects that were majorly female make more money?

Similar to the above, plot ShareWomen against Median.

Similar to the above, plot Full_time against Median.

Regarding the third question I have plotted Full_time against Median and I have spotted that most data points are gathered around Full_time (0,25000) and Median (20000,40000).

However I cannot understand what kind of information those results provide.

The only thing I can point out is that there is great variability of median incomes spanning from 20.000 to 80000 and one outlier of 1200000 as full_time approaches 0.

I enclose the scatter plot as well so you can have an overview as well

Judging by the graph, I would say the results indicate that thereâs no significant relation between the number of full time employees and the median salary.

Thank you very much for the immediate reply and the answer!
Could you elaborate a bit more about how you came up with that conclusion.
What is the thing that indicates that there is no relation between the two variables?

Itâs simply the fact that nothings stands out, there is no linear variation between the variables. When Median grows, noting really happens to the number of full time employees, for instance.

This can be verified (or maybe indicate that my visual assessment is wrong) with the Pearson correlation coefficient (if youâve learned about it already).

@Bruno I would recommend modifying the Instructions for this part of the Guided Project.

The instructions ask us to explore certain relationships using scatter plots. And then pose us questions stating -

Use **the plots** to explore the following questions

That the plots makes it seem that we are supposed to look at the plots we already created.

But two of the questions donât correspond to variable/features being plotted. And it becomes slightly confusing as to whether or not we (students) can answer those questions with just those plots. Creating new plots to answer those questions might seem obvious to some students, but not all.

My general recommendations to help everyone learn through this better would be -

Either add those features in the instructions for creating the scatter plots so that the questions can be answered.

Or, create two separate set of questions. One set is relevant for the already created scatter plots. One set asks students to create plots to answer questions by identifying the necessary features themselves.

Or, instead of first asking to create specific plots and then answering questions. Just post the questions, mention what features can help answer those questions and ask students to use scatter plots to answer those.

The 2nd one is the best option as per me.

There are other improvements I could maybe suggest, but donât wish to trouble you

I agree with the_doctor, this instructions make it seem like the second bullet point is asking you to explore the questions using the scatter plots generated from the previous bullet point. The wording is confusing.

The wording is not tricky,they straight up ask us to answer questions using the plots produced, which appear not all that relevant.

Could someone from Dataquest please acknowledge our observations and give us some clarification/revision please?

If we ARE indeed supposed to be able to answer these questions using the scatter plots generated from the first bullet point, could we please receive more instruction/hints on how to infer these answers from them? Otherwise please reword the instructions within the the exercise.

The question is: Is there any link between the number of full-time employees and median salary?

Since each data point is a also unique major, couldnât you also infer that majors with more full time employees, which also happens to be the more popular majors (more total students) donât necessarily have a higher salary than majors with less full time employees?

I kinda thought those questions were intentionally asked. As Dataquest instructed us to make listed scatter plots and asked unrelated questions to see if we know those questions canât be answered by the plots we just made.
Perhaps they should ask another question, if we canât answer those question, then add additional plots to find out.

It was confusing for me at first to look at those plots as I donât see any relation/ trend/ dependence. The only thing I can say is the data is skewed to the left when the quantity in x-axis is small. Just like when we conduct survey on a small sample size, the collected answers wonât tell the truth story. As the sample become bigger, then the data is more reliable.

Hi Bruno,
I have trouble looking at the graph and seeing how you reach this conclusion - 'in the top-right quadrant (where the popular and well-paying majors are located )

I see that Total number of people with major, increases along the x axis so I see that is rising popularity for what the major is but do not see any increase along the y axis as x increases so I do not see where the âwell-paying majors are locatedâ comes from . Please advise?

Would it be correct to say that there appears to be a weak negative correlation between major popularity and earnings? Since the majority of the medians seem clustered in the 30-50K range with median range decreasing and staying within this narrow range as values increase along the x-axis.

Very much agree here. I was looking at the scatter plots that I created as per the instructions and concluded that I am not able to answer the questions. Fortunately I am not the only one :-), but I agree that it would be good to update the instructions.

Yeah, it happened to me as well. Finally I made first all the plots suggested by the instructions and a couple of additional ones and then changed the design of the analysis according to the logic that seemed more appropriate to me. I still have to finish some details in the introduction and the conclusion and hopefully tomorrow itâll be ready to share with the community