Explanation for college majors dataset

Screen Link:
https://app.dataquest.io/m/146/guided-project%3A-visualizing-earnings-based-on-college-majors/1/introduction

I am not able to understand that the what these columns mean in the dataset?

  1. Sample_size
  2. ShareWomen
  3. Median

The meaning in the sense that what data is present in these columns as it’s useless to go further until I understand dataset fully.

Hello @joshi.ananya.joshi1,

one can find the description of each column on the left side:

  • Sample_size - Sample size (unweighted) of full-time.
  • ShareWomen - Women as share of total.
  • Median - Median salary of full-time, year-round workers.

Is this it, or did you mean the more detailed info, like, e.g. median, sample_size, ShareWomen of what population are analyzed?

1 Like

This is what is mentioned in the screen i want to understand what this actually means? This meaning given is also not understandable clearly i want to understand this.

I think I get it now.

So, the dataset is filled with survey results of “job outcomes of students who graduated from college between 2010 and 2012.”

The data is separated by Majors, so:

  • Sample_size is not definitely clear; I take an educated guess that this is the number of people for which the earnings were calculated; The github repository from which recent_grads.csv is taken is more specific on Sample_size: Sample size (unweighted) of full-time, year-round ONLY (used for earnings) (bolded by kakoori)
  • ShareWomen is the fraction of women with respect to the total number of students surveyed, e.g. for “Molecular Biology” major there are 10874 women majors for total of 18300, so the ShareWomen is 10874/18300 = 0.59420765, which is rounded to 6 decimals giving 0.594208
  • Median is the salary which separates the sample group in half with respect to earnings, e.g. for “Molecular Biology” major the median is $40 000 - 50% of full-time, year-round workers earn more than this and 50% earn less.

Is this what you are looking for?

2 Likes