Explanation for college majors dataset

I am not able to understand that the what these columns mean in the dataset?

1. Sample_size
2. ShareWomen
3. Median

The meaning in the sense that what data is present in these columns as it’s useless to go further until I understand dataset fully.

Hello @joshi.ananya.joshi1,

one can find the description of each column on the left side:

• `Sample_size` - Sample size (unweighted) of full-time.
• `ShareWomen` - Women as share of total.
• `Median` - Median salary of full-time, year-round workers.

Is this it, or did you mean the more detailed info, like, e.g. `median`, `sample_size`, `ShareWomen` of what population are analyzed?

1 Like

This is what is mentioned in the screen i want to understand what this actually means? This meaning given is also not understandable clearly i want to understand this.

I think I get it now.

So, the dataset is filled with survey results of “job outcomes of students who graduated from college between 2010 and 2012.”

The data is separated by `Major`s, so:

• `Sample_size` is not definitely clear; I take an educated guess that this is the number of people for which the earnings were calculated; The github repository from which `recent_grads.csv` is taken is more specific on `Sample_size`: Sample size (unweighted) of full-time, year-round ONLY (used for earnings) (bolded by kakoori)
• `ShareWomen` is the fraction of women with respect to the total number of students surveyed, e.g. for “Molecular Biology” major there are 10874 women majors for total of 18300, so the `ShareWomen` is 10874/18300 = 0.59420765, which is rounded to 6 decimals giving 0.594208
• `Median` is the salary which separates the sample group in half with respect to earnings, e.g. for “Molecular Biology” major the median is \$40 000 - 50% of full-time, year-round workers earn more than this and 50% earn less.

Is this what you are looking for?

2 Likes