Hi! I try to see race distribution in good schools that have over 1800 sat score and race distribution in regular schools which have sat scores less than 1800. Below is my code.
My questions are:
1,I try to do one figure with multiple plots so that I can see 2 charts side by side.I tries fig,ax=plt.subplots(1,2,figsize=(10,6)) but it did not work.
2, The bar charts I created based on below code having a black straight on each bar, why and how to remove it?
racial_dbn_col=["white_per", "asian_per", "black_per", "hispanic_per",'DBN']
elite_school=combined.loc[combined['sat_score']>1800,racial_dbn_col]
all_school=combined.loc[combined['sat_score']<1800,racial_dbn_col]
#Select racial columns
race=[c for c in elite_school if c.endswith('per')]
#Change multipel racial columns into one column--create a new dataset for plotting
elite_school_race=pd.melt(elite_school,id_vars='DBN',value_vars=race, value_name='percentage')
elite_school_race.rename({'variable':'Race'},axis=1,inplace=True)
#creat bar graphs for elit school
ax1=sns.barplot(x='Race', y='percentage', data=elite_school_race)
ax1.set_title('Race Distribution in Elite Schools',fontsize='medium')
#Do the same work on all_school dataset
race=[c for c in all_school if c.endswith('per')]
all_school_race=pd.melt(all_school,id_vars='DBN',value_vars=race,value_name='percentage')
all_school_race.rename({'variable':'Race'},axis=1,inplace=True)
sns.barplot(x='Race',y='percentage',data=all_school_race)
fig, ax = plt.figure(figsize=(10,6))
ax1 = plt.add_subplot(1,2,1) - This one is the first plot
First plot code
ax2 = plt.add_subplot(1,2,2) - This one is the second plot
second plot code
ci: float or “sd” or None, optional
Size of confidence intervals to draw around estimated values. If “sd”, skip bootstrapping and draw the standard deviation of the observations. IfNone , no bootstrapping will be performed, and error bars will not be drawn.
The black line on the bar means that the bar is incorrect? I dont understand the explanation. I mean i understand every word there, but i dont know what does it mean. If I want to remove it, what should i do?
About the multiple plots, i tried to the code you provide, but it does not create any chart. I have the error saying " can’t convert string to float: hispanic_per"
No, the black line represents the uncertainty or variation in the data, sometimes it’s measured in standard deviation units. This resources might help you to understand more
Thank you for sharing the articles. It is helpful. I guess i can just mention in the project that i need to do further statistics test due to the error bar showing.
About the code you provided all_school_race.plot(x='Race', y='percentage', kind=bar)
Ok i made a mistake, i did not understood the plot you wanted to make. Now i did, i took the code you posted on your very first post and added the subplots to make this
The code looks like this
racial_dbn_col = ["white_per", "asian_per", "black_per", "hispanic_per",'DBN']
elite_school=combined.loc[combined['sat_score']>1800,racial_dbn_col]
all_school=combined.loc[combined['sat_score']<1800,racial_dbn_col]
#Select racial columns
race=[c for c in elite_school if c.endswith('per')]
#Change multipel racial columns into one column--create a new dataset for plotting
elite_school_race=pd.melt(elite_school,id_vars='DBN',value_vars=race, value_name='percentage')
elite_school_race.rename({'variable':'Race'},axis=1,inplace=True)
fig, ax = plt.subplots()
#creat bar graphs for elit school
ax1 = plt.subplot(1, 2, 1)
ax1 = sns.barplot(x='Race', y='percentage', data=elite_school_race, ci=None)
ax1.set_title('Race Distribution in Elite Schools',fontsize='medium')
plt.xticks(rotation=90)
#Do the same work on all_school dataset
race=[c for c in all_school if c.endswith('per')]
all_school_race=pd.melt(all_school,id_vars='DBN',value_vars=race,value_name='percentage')
all_school_race.rename({'variable':'Race'},axis=1,inplace=True)
ax2 = plt.subplot(1, 2, 2)
ax2 = sns.barplot(x='Race',y='percentage',data=all_school_race, ci=None)
ax2.set_title('Race Distribution in all Schools', fontsize='medium')
plt.xticks(rotation=90)
About the error bars, it’s not that you need more statistic testing because of the presence of the error bars. The bars are showing the standard deviation, standard error or confidence intervals of each bar. Wich allows you to see and interpret the statistical significance as explained in the Biology for life article. In this case seaborn is plotting the confidence intervals wich are explained here, if you try to measure the standard deviation you get an error because there’s strings and integers in this data.