Try to do multipel plots for racial columns in good schools and bad schools

Hi! I try to see race distribution in good schools that have over 1800 sat score and race distribution in regular schools which have sat scores less than 1800. Below is my code.

My questions are:

1,I try to do one figure with multiple plots so that I can see 2 charts side by side.I tries fig,ax=plt.subplots(1,2,figsize=(10,6)) but it did not work.
2, The bar charts I created based on below code having a black straight on each bar, why and how to remove it?

Thank you!!!

Screen Link: https://app.dataquest.io/m/217/guided-project%3A-analyzing-nyc-high-school-data/2/exploring-safety-and-sat-scores

My Code:

racial_dbn_col=["white_per", "asian_per", "black_per", "hispanic_per",'DBN']
elite_school=combined.loc[combined['sat_score']>1800,racial_dbn_col]
all_school=combined.loc[combined['sat_score']<1800,racial_dbn_col]

#Select racial columns
race=[c for c in elite_school if c.endswith('per')]
#Change multipel racial columns into one column--create a new dataset for plotting
elite_school_race=pd.melt(elite_school,id_vars='DBN',value_vars=race, value_name='percentage')
elite_school_race.rename({'variable':'Race'},axis=1,inplace=True)
#creat bar graphs for elit school
ax1=sns.barplot(x='Race', y='percentage', data=elite_school_race)
ax1.set_title('Race Distribution in Elite Schools',fontsize='medium')

#Do the same work on all_school dataset
race=[c for c in all_school if c.endswith('per')]
all_school_race=pd.melt(all_school,id_vars='DBN',value_vars=race,value_name='percentage')
all_school_race.rename({'variable':'Race'},axis=1,inplace=True)
sns.barplot(x='Race',y='percentage',data=all_school_race)

What actually happened:

![image|383x283](upload://tlGxH2S3fDdaa1fA75QakCV0hxa.png) ![image|383x283](upload://tlGxH2S3fDdaa1fA75QakCV0hxa.png) 

Hi @candiceliu93

the code to plot multiple plots should be

fig, ax = plt.figure(figsize=(10,6))
ax1 = plt.add_subplot(1,2,1) - This one is the first plot
First plot code
ax2 = plt.add_subplot(1,2,2) - This one is the second plot
second plot code

The black line is the error bar, the seaborn documentation says

ci: float or “sd” or None, optional
Size of confidence intervals to draw around estimated values. If “sd”, skip bootstrapping and draw the standard deviation of the observations. If None , no bootstrapping will be performed, and error bars will not be drawn.

Good luck!

Hi @alegiraldo666

The black line on the bar means that the bar is incorrect? I dont understand the explanation. I mean i understand every word there, but i dont know what does it mean. If I want to remove it, what should i do?

About the multiple plots, i tried to the code you provide, but it does not create any chart. I have the error saying " can’t convert string to float: hispanic_per"

ax1 = plt.add_subplot(1,2,1) 
ax1.plot(kind='bar', all_school_race['Race'],all_school_['percentage'])
ax2 = plt.add_subplot(1,2,2) 
ax2.plot(......)```

No, the black line represents the uncertainty or variation in the data, sometimes it’s measured in standard deviation units. This resources might help you to understand more


https://datavizcatalogue.com/methods/error_bars.html

About the error, what if you try
all_school_race.plot(x='Race', y='percentage', kind=bar)

Thank you for sharing the articles. It is helpful. I guess i can just mention in the project that i need to do further statistics test due to the error bar showing.

About the code you provided all_school_race.plot(x='Race', y='percentage', kind=bar)

The chart is kind of weird.bar chart

Ok i made a mistake, i did not understood the plot you wanted to make. Now i did, i took the code you posted on your very first post and added the subplots to make this
imagen

The code looks like this

racial_dbn_col = ["white_per", "asian_per", "black_per", "hispanic_per",'DBN']
elite_school=combined.loc[combined['sat_score']>1800,racial_dbn_col]
all_school=combined.loc[combined['sat_score']<1800,racial_dbn_col]

#Select racial columns
race=[c for c in elite_school if c.endswith('per')]
#Change multipel racial columns into one column--create a new dataset for plotting
elite_school_race=pd.melt(elite_school,id_vars='DBN',value_vars=race, value_name='percentage')
elite_school_race.rename({'variable':'Race'},axis=1,inplace=True)

fig, ax = plt.subplots()
#creat bar graphs for elit school
ax1 = plt.subplot(1, 2, 1)
ax1 = sns.barplot(x='Race', y='percentage', data=elite_school_race, ci=None)
ax1.set_title('Race Distribution in Elite Schools',fontsize='medium')
plt.xticks(rotation=90)

#Do the same work on all_school dataset
race=[c for c in all_school if c.endswith('per')]
all_school_race=pd.melt(all_school,id_vars='DBN',value_vars=race,value_name='percentage')
all_school_race.rename({'variable':'Race'},axis=1,inplace=True)
ax2 = plt.subplot(1, 2, 2)
ax2 = sns.barplot(x='Race',y='percentage',data=all_school_race, ci=None)
ax2.set_title('Race Distribution in all Schools', fontsize='medium')
plt.xticks(rotation=90)

About the error bars, it’s not that you need more statistic testing because of the presence of the error bars. The bars are showing the standard deviation, standard error or confidence intervals of each bar. Wich allows you to see and interpret the statistical significance as explained in the Biology for life article. In this case seaborn is plotting the confidence intervals wich are explained here, if you try to measure the standard deviation you get an error because there’s strings and integers in this data.

Good luck!

1 Like

Thank you @alegiraldo666

It is exactly the chart I am looking for!!!

And your reference explains the error bar so clear! thank you!!