Going Beyond Simple Bar Plots

Bar plots seem to be the best choice when it comes to displaying proportions of different elements of the whole. Are there some other charts with functionality similar to theirs? Let’s explore this question using a bar-related dataset from Kaggle — Alcohol Consumption around the World :star_struck::wine_glass: In particular, we’ll focus on the TOP10 countries by strong drink consumption. Also, we’ll take a quick look at what other types of bar plots exist and when they can be used.

Let’s read the data in pandas:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('drinks.csv')
print(df.head(3))

Output:

   country  beer_servings  spirit_servings  wine_servings  \
0  Albania             89              132             54   
1  Algeria             25                0             14   
2  Andorra            245              138            312   

   total_litres_of_pure_alcohol  
0                           4.9  
1                           0.7  
2                          12.4   

and extract the necessary information:

top10_spirit = df.sort_values('spirit_servings', 
                              ascending=False)[:10]\
                 .reset_index(drop=True)\
                 [['country', 'spirit_servings']]

Pie Chart

Pie charts and donut plots (which are practically pie charts with the central area cut out) represent the most common alternative to bar plots . However, if to choose between them and bar plots, the second seems a safer choice. In fact, pie charts, being based on angles rather than lengths, are usually more difficult to read and extract insights from. If we have a lot of categories in a pie chart and/or the categories with almost equal proportions, it would be quite complicated even to try to find the biggest/smallest category, not to mention ordering them by value:

fig, ax = plt.subplots(figsize=(16,8))
fig.tight_layout(pad=4)

# Creating a case-specific function to avoid code repetition
# (will be used for comparing with 2 more plot types)
def plot_hor_bar_2():
    ax = sns.barplot(x='spirit_servings', y='country',
                     data=top10_spirit, color='slateblue')
    plt.title('TOP10 countries by spirit consumption', fontsize=28)
    plt.xlabel('Servings per person', fontsize=23)
    plt.xticks(fontsize=20)
    plt.ylabel(None)
    plt.yticks(fontsize=20)
    sns.despine(left=True)
    ax.grid(False)
    ax.tick_params(bottom=True, left=False)
    return None

plt.subplot(1,2,1)
plot_hor_bar_2()

plt.subplot(1,2,2)
plt.pie(top10_spirit['spirit_servings'],
        labels=top10_spirit['country'],
        startangle=90, autopct='%1.1f%%')
plt.title('TOP10 countries by spirit consumption', fontsize=28)
matplotlib.rcParams['font.size']=18
plt.show()


Even though we have only 10 countries displayed on both graphs (the TOP10 countries by drinking spirit in 2010), we can’t get much from the pie chart. Indeed, the sectors look rather similar to one another, and even assigning the startangle parameter to 0 and adding labels in % didn’t really improve the readability of the second chart.

Another drawback of these graphs is that we have to use a different color for each category, while, as we discussed earlier, on bar plots it’s enough to use the same color for all bars (if we don’t want to emphasize anything in particular). It means that on a pie chart each category is characterized by 2 features: color and angle, which creates redundant visual information. And if we have a lot of categories and, hence, a lot of colors, then our pie chart becomes overwhelming. Again, the graph above, with only 10 elements, already looks heavily overloaded.

Finally, in the case of several pie charts, each representing a category subdivided into the same elements in different proportions (an analogue of a grouped bar plot), it would be almost impossible to trace the trends of all elements or to figure out any meaningful pattern in the data.

Based on all the above, bar plots should be preferred to pie charts.

Treemap

Like bar and pie plots, a treemap shows what the whole data consists of. It displays hierarchical data as a set of nested rectangles, with the area of each rectangle being proportional to the value of the corresponding data. To customize a treemap, we can assign a list of colors for the rectangles, color and font size for the labels, and some other parameters.

Let’s duplicate the bar plot above, this time comparing it with a corresponding treemap:

import squarify

# Renaming Russian Federation into Russia to avoid too long lables
top10_spirit.at[3, 'country'] = 'Russia'

fig, ax = plt.subplots(figsize=(16,8))
fig.tight_layout(pad=4)

plt.subplot(1,2,1)
plot_hor_bar_2()

plt.subplot(1,2,2)
cmap = matplotlib.cm.tab20

colors = []
for i in range(len(top10_spirit.index)):
    colors.append(cmap(i))    

squarify.plot(sizes=top10_spirit['spirit_servings'],
              label=top10_spirit['country'],
              color=colors,
              text_kwargs={'fontsize': 16})
plt.title('TOP10 countries by spirit consumption', fontsize=28)
plt.axis('off')
plt.show()

We can confirm that the treemap looks much more insightful than the pie chart created earlier: perceiving areas is definitely much easier than perceiving angles. We didn’t add the values or percentages on our treemap; however, it’s still possible to estimate on a qualitative level the most and the least spirit-drinking countries from the TOP10 (mine is in 3rd place, together with Haiti :pensive:). The issue with using a lot of colors for one visualization remains also for a treemap, though.

Stem Plot

A stem plot represents a good alternative to a bar chart with many bars, or with bars of similar lengths. It maximizes the data-ink ratio of a chart and makes it more readable. To create a horizontal stem plot, we can use either stem() or vlines() function, to create a vertical one – only hlines() in combination with plot(). These functions have a lot of parameters to tune for improving the appearance of the resulting plot. In this article, you can find more information on how to customize both stem plots and treemaps.

fig, ax = plt.subplots(figsize=(16,8))
fig.tight_layout(pad=4)

plt.subplot(1,2,1)
plot_hor_bar_2()
top_sorted = top10_spirit.sort_values('spirit_servings',
                                      ascending=True)\
                         .set_index('country')

plt.subplot(1,2,2)
plt.hlines(y=top_sorted.index, xmin=0, xmax=top_sorted,
           color='slateblue')
plt.plot(top_sorted, top_sorted.index,
         'o', color='slateblue')
plt.title('TOP10 countries by spirit consumption', fontsize=28)
plt.xlabel('Servings per person', fontsize=23)
plt.xticks(fontsize=20)
plt.xlim(0, None)
plt.ylabel(None)
plt.yticks(fontsize=20)
sns.despine(left=True)
ax.grid(False)
ax.tick_params(bottom=True, left=False)
plt.show()

The second plot looks less cluttered and more elegant, and this effect becomes more evident for the cases with many categories. Also, we can notice that Russia and Haiti have swapped places, having the same values.

Specific Types of Bar Plots

There are some particular types of bar charts, which can be of use in certain, rather limited conditions:

  • Radial bar plots look like involute bar charts, plotted not in Cartesian but polar coordinates. Each bar starts at a different radial point and has a circular shape instead of a line. Even though these visualizations look rather effective, they should be almost always avoided, since they strongly distort the perception of the data behind them and are difficult to analyze: the bars have different perimeters (instead of lengths, as common bar plots), the inner ones, which are supposed to be the smallest, seem even smaller and the outer ones seem larger, even though some of them can be of the same length. The absence of the y-axis on such graphs creates additional confusion. Sometimes radial bar charts are mistakingly called circular, which is actually another type of graphs that we’ll see soon. You can check this StackOverflow post on how to create radial bar plots in Python.

  • Circular and polar bar charts have the bars in a form of segments of different lengths, starting from a circle (circular version) or a point (polar version) instead of a line as in conventional bar plots. These kinds of graphs work best for a large number of categories (bars) with an evident pattern of cyclicity in them. In all the other cases, though, circular and polar bar plots are not a good choice because of some issues: the absence of y-axis, difficulty to visually interpret the length differences between segments, an optical illusion of the smaller bars (those close to the base circle or point) being even smaller. Here is a demo of how to create a polar bar chart in matplotlib.

  • Gantt chart. Illustrates a project schedule, the relationships between different activities, and the current schedule status.

  • Waterfall chart. Shows how a starting value has been modified (increased or decreased) up to a final value after sequentially applied positive or negative changes in a data series.

Conclusion

In a nutshell, even though a bar plot remains the most popular visualization type to illustrate what the whole consists of, for the sake of more efficient storytelling, we can always consider some case-specific alternatives, or to choose between particular types of bar plots. For further reading, you can find useful this article, discussing the best practices and issues on how to create a meaningful bar plot.

Thanks for reading! Cheers! :beers:

10 Likes

Great article Elena! Very helpful and entertaining.
However, the drinks look so much more inviting than plotting charts!!
Bruce

2 Likes

Thanks a lot, Bruce! Indeed, drinks can serve well for better understanding of alternative plots! :grinning:

2 Likes