Problem : You collect up names of majors into a list based on whether they are STEM majors or not. But, now you want this list sorted in the decreasing order of percentage of women. How do you do this automatically?
Eg : starting with :
stem_cats = ['Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', 'Math and Statistics']
sorted_cols( stem_cats, women_degrees, first_not_last=False , reverse=True)
'Math and Statistics',
def sorted_cols( inlist, df, first_not_last=True , reverse=False) :
""" list of strings, df, bool, bool --> list (sorted)"""
# use either first or last element in each col of df(specified by inlist) to sort it
return sorted( inlist, reverse=reverse,
key=lambda col : df.iloc[col] if first_not_last else df.iloc[-1][col])
I am quite unsure what your exact query is. So do reply if this response doesn’t help you much.
I assume you are asking how they derived at the three lists with descending percentage of degrees awarded to women, from the dataset
If that’s the case, again this may not be the correct answer but they may have done something similar.
Let’s compare few metrics for each of the degrees, I have sorted by mean in descending order:
women_degrees.agg([min, max, np.mean, np.median]).T.sort_values(by = "mean", ascending = False)[1:]
It gives us the following result.
The dataset creator/ author may have segregated the three lists using similar metrics and arranged them in 3 lists based on the category of the degree (STEM or Art or Other).
In case you can work out the code on your own, great! In case you need help with that do let the community know.
Please note there is no sequence for the two steps; that is we can first flag the degrees into classes and then calculate comparative metrics.