Act fast, special offers end soon! Up to $294 is savings when you get Premium today.
Get offer codes

Plotting FiveThirtyEight graphs with Bokeh

Hey everyone! In this article we are going to learn about a different data visualization library in Python called Bokeh and the objective will be to plot line graphs in FiveThirtyEight style.

Bokeh is a Python library with the purpose of simplifying the creation of common plots as well as handling custom and specialized use-cases, all of that in a powerful open-source and interactive environment.
It works similarly to matplotlib, the main difference is that it creates a html object that allows you to insert a lot of interesting features, like zooming in or out, inserting labels or buttons to do specific things. The great part of it is that you can add it all in batches, so you don’t need to create the full graph at once.

Our approach is to create a simple one and add or change the configurations along the article, this way we are going to have a full styled complex graph at the end (actually, more than one).

In reality we are going to build a function to create line plots in FiveThirtyEight style, more specifically, we are going to create the first line of graphs suggested in the Guided Project of the Storytelling Data Visualization and Information Design course, from the Data Scientist in Python path here in Dataquest. The ones bellow:

This was one of the suggestions Dataquest gave, I liked it, so I chose to build it.

The main idea is to plot various single-line graphs in different colors and use it to create the dashboard above in the future (I may write another article about how to do it).

I am Brazilian, so I have a special curiosity about exchange rates between Euro and BR Real, but instead of analyzing just that two, I would like to see how US Dollar behaved too. With that in mind, we are going to create a function to reproduce the graphs with different data, this way we could create dashboards for any currencies just changing some details. Before we start, let me give you some important information about how to make the code work and how the data looks like.

1. Initial setup

a. Handling the environment

In this tutorial, I’ll use an older version of Bokeh (2.2.0) because this one works better with Streamlit (the platform I’m using to create an app with the dashboard we are making). To download Bokeh you can use pip with the explicit version (pip install Bokeh==2.2.0) or without the version, if you want to download the last release. For a detailed explanation of the installation process, access Bokeh’s documentation.

Before the installation, be sure you have the following dependencies installed:

PyYAML>=3.10
python-dateutil>=2.1
Jinja2>=2.7
numpy>=1.11.3
pillow>=4.0
packaging>=16.8
tornado>=5
typing_extensions >=3.7.4

Beyond that, if you want to reproduce the code we are going to create, you’ll need to import the following:

import pandas as pd
import numpy as np

from bokeh.plotting import figure, show
from bokeh.models import DatetimeTickFormatter, Label
from bokeh.models.tickers import FixedTicker
from bokeh.layouts import row
from bokeh.io import output_notebook

And, to show the graphs inline, in the jupyter notebook, you need to run the code bellow.

# to show bokeh graphs inline
output_notebook()

There’s also a functionality that is going to change in the next releases in pandas, so this will raise a warning in the process. If you don’t want to see the red bands of warning when you run the code, just run the code bellow.

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

b. Importing and getting to know the data

The original data was taken from the Kaggle dataset collected by Daria Chemkaeva, the data treatment process can be found in this jupyter notebook located in my GitHub repo. After exporting it to csv, the time column was transformed into string again, but we can resolve it quickly.

# reading the treated data from my github repo
euro = pd.read_csv('https://raw.githubusercontent.com/nathpignaton/guided_projects/main/euro-exchange-rates/euro_data.csv')

# converting the time column to datetime
euro['time'] = pd.to_datetime(euro['time'])

Our dataset is composed of Exchange Rates between Euro and two currencies: US Dollar and Brazilian Real. The columns are the following:

  • Time - the day referent to that currency
  • dollar - the rolling mean of Euro and US dollar exchange rates
  • real - the rolling mean of Euro and BR Real exchange rates
  • president_us - the North American president in that moment
  • president_br - the Brazilian president in that moment

image

2. Creating the clean line plot function

We are going to start creating a single line plot, checking out the steps to change every important aspect of the plot to make it just like our model. After the whole configuration, we are going to define a function that compiles it all.

a. Creating the default line plot

To make the model plot, we are going to use the US data. First we are going to create a figure with a title and dimensions we want, then add the default line plot.

# a square figure, since we want all presidents next to each other
us = figure(title='Euro x US Dollar', plot_width=400, plot_height=400)

# adding a simple lineplot
us.line(euro['time'], euro['dollar'])

# displaying it inline
show(us)

image

As we can see above, the x axis looks very weird, the reason is that Bokeh does not understand it is in date format. To solve it, we can call the function DatetimeTickFormatter() and since we are going to change the x axis, let’s do all the axis styling.

b. Styling the axis

To make our graph like our model, we need to:

  1. Change the format of the x axis to Datetime, showing only the years;
  2. Remove the ticks in both axis;
  3. Remove the axis lines;
  4. Change the color of the tick labels in both axis to a light gray (Bokeh’s silver);

To do this changes, we need to call the figure and use specific bokeh functions, that are described in the code bellow.

# 1. Changing the format of the x axis numbers
us.xaxis[0].formatter = DatetimeTickFormatter(months='%Y', years='%Y')

# 2. Removing the ticks in both axis# removing the ticks
us.xaxis.minor_tick_line_color = None
us.xaxis.major_tick_line_color = None
us.yaxis.minor_tick_line_color = None
us.yaxis.major_tick_line_color = None

# 3. Removing the axis lines
us.xaxis.axis_line_color = None
us.yaxis.axis_line_color = None

# 4. Changing the color to silver - one light shade of gray
us.xaxis.major_label_text_color = 'silver'
us.yaxis.major_label_text_color = 'silver'

us_gif

c. Styling the background and lines

In our model , the background and the borders are in a very light gray and the grid lines seem to be in a slight darker shade of gray, there is also no outline. To adapt it we are going to:

  1. Change the background into a light gray called “snow” in bokeh colors nomination;
  2. Change the border to the same color as the background;
  3. Change the grid lines to a light gray, named “gainsboro”;
  4. Remove the outline;
# 5. Changing the background
us.background_fill_color = 'snow'

# 6. Changing the borders
us.border_fill_color = 'snow'

# 7. Changing the grid lines to an even lighter gray
us.xgrid.grid_line_color = 'gainsboro'
us.ygrid.grid_line_color = 'gainsboro'

# 8. Removing outlines
us.outline_line_color = None

us_gif2

d. Setting up the last details

To make it just like our model, we have to do other steps, some of them are a little bit more complex than our last changes, so let’s do it one by one:

  1. Centralize the title, change it to upper case and make it bigger;

We are going to make a double alignment, to center it and to put it in the top, this will create a bigger gap between the title and the graph.

# 9. Centralize the title center and top
us.title.align = "center"
us.title.vertical_align = 'top'
# adding the upper case title
us.title.text = us.title.text.upper()
# making the font bigger
us.title.text_font_size = '18px'

image

  1. Create a subtitile;

To do so, we are going to use the Label Annotation resource, we need to insert some specific configurations to locate the subtitle in the right spot, like the x and y units and other details like font styling.

There are two resources to locate the annotation: inserting x and y coordinates and aligning the object when we add it to the figure. I combined both methods testing the location of the annotation rendering the graph several times.

Other than that, we can set the font style, color, size and other characteristics of the annotation, only the essential was included to our model.

# 10. Creating the subtitle
subtitle = Label(x=135, y=12, 
                 x_units='screen', y_units='screen', 
                 text='(2001-2021)', 
                 text_color='silver', 
                 text_font_style = 'bold',
                 text_font_size='15px')

# Adding the object to the figure
us.add_layout(subtitle, 'above')

image

  1. Change the line color and width;

To do so, we need to change the beginning of the code we wrote so far, this two attributes are defined in the moment we add the line plot to the figure, so our first lines of code will change from this:

us = figure(title='Euro x US Dollar', plot_width=400, plot_height=400)

# defining the default lineplot
us.line(euro['time'], euro['dollar'])

To this:

us = figure(title='Euro x US Dollar', plot_width=400, plot_height=400)

# defining the default lineplot with different line color and width
us.line(euro['time'], euro['dollar'],
        line_color = 'teal',
        line_width = 2)

image

  1. Make the axis tick labels a little bit bigger

In the process we followed, we didn’t pay attention that, in our model graph, the axis labels were a little bit bigger than default, with that in mind we can just change this information with the proper function.

# 12. Changing the size of the tick labels
us.xaxis.major_label_text_font_size = "13px"
us.yaxis.major_label_text_font_size = "13px"

image

  1. Change the title color to the same as the line color;

This can be done with the function bellow:

# changing the title color
us.title.text_color = 'teal'

image

Well, after all the styling and configuration, we have our single line plot ready! We’ve come a long way, now we will to turn it all into a function to automatize the process making our work to create the dashboard easier.

e. Creating the function

The process is very straightforward, we just need to put all the steps we did above in the code. The steps are going to be commented and there’s also a explanation note with the purpose of each variable.

def clean_lineplot(x, y, fig, title, sub, color='blue'):
    """
    This function takes a couple information and creates a clean full styled lineplot, the variables are:
    x = data for the x axis
    y = data for the y axis
    title = the title of the plot
    sub = the subtitle of the plot
    color = the color of the line, default is blue
    """
    f = fig
    f.line(x, y, line_color=color, line_width=2)
    
    # 1. Changing the format of the x axis numbers
    f.xaxis[0].formatter = DatetimeTickFormatter()

    # 2. Removing the ticks in both axis# removing the ticks
    f.xaxis.minor_tick_line_color = None
    f.xaxis.major_tick_line_color = None
    f.yaxis.minor_tick_line_color = None
    f.yaxis.major_tick_line_color = None

    # 3. Removing the axis lines
    f.xaxis.axis_line_color = None
    f.yaxis.axis_line_color = None

    # 4. Changing the color to silver - one light shade of gray
    f.xaxis.major_label_text_color = 'silver'
    f.yaxis.major_label_text_color = 'silver'
    
    # 5. Changing the background
    f.background_fill_color = 'snow'

    # 6. Changing the borders
    f.border_fill_color = 'snow'

    # 7. Changing the grid lines to an even lighter gray
    f.xgrid.grid_line_color = 'gainsboro'
    f.ygrid.grid_line_color = 'gainsboro'

    # 8. Removing outlines
    f.outline_line_color = None
    
    # 9. Creating and configurating the title
    f.title.text = title
    f.title.align = "center"
    f.title.vertical_align = 'top'    
    f.title.text_font_size = '18px'
		f.title.text = f.title.text.upper()
    
    # 10. Creating and configurating the subtitle
    subtitle = Label(x=135, y=12, 
                     x_units='screen', y_units='screen', 
                     text=sub, 
                     text_color='silver', 
                     text_font_style = 'bold',
                     text_font_size='15px')
    f.add_layout(subtitle, 'above')
    
    # 11. Change the color of the graph in the plot creation
    
    # 12. Changing the size of the tick labels
    f.xaxis.major_label_text_font_size = "13px"
    f.yaxis.major_label_text_font_size = "13px"

		# 13. Changing the title color
		f.title.text_color = color

    # Extra: making the toggling bar autohide :)
    f.toolbar.autohide = True

    return f

To use it, we need to define a figure with the dimensions of the plot first. After it, we just need to call the function with all the variables needed, like this:

us = figure(plot_width=400, plot_height=400)

us = clean_lineplot(euro['time'], euro['dollar'], 
                    color='mediumvioletred', fig=us, 
                    title='EURO VERSOS US DOLLAR', 
                    sub='2001-2021')
show(us)

image

Now that we have our first function, we can create the first part of the model with a graph for each president in different colors in the same plot row.

3. Creating the president plots row

a. Creating the initial loop

Our challenge is to make a loop that can plot a different graph per president equally spaced, taking in consideration the amount of presidents each dataset has (the amount of presidents in the US from 2001 to 2021 is different than the amount of presidents in Brazil in the same period of time).

Here are the basic things we need to have:

  1. A list with the color palette we want to use (it doesn’t have to have to exact amount of presidents), the colors were picked from bokeh color list;
# 1. Defining the color palette
colors = ['lightcoral', 'seagreen', 'mediumpurple', 'gold', 'orangered', 'teal', 'mediumvioletred']
  1. A dictionary with the currency column name and the president column name, this is how it looks like:
# 2. Defining the currencies dictionary 
currencies = {'dollar': 'president_us', 
              'real': 'president_br'}
  1. An empty dictionary to store the graphs made in the loop.
# 3. Defining a dictionary to store the plots
plots = {}
  1. A loop that:
    1. Selects the data for each currency and country - with the help of our dictionary;
    2. Creates a clean line plot, for each president, with the right year range in different colors - with the help of our color palette;
    3. Creates a title for each graph with the president name and the color used in the line;
    4. Creates a subtitle with the correspondent years for each president;

Normally we would need to create a specific figure with a grid configuration to plot the graphs side by side in the right positions, but Bokeh makes the process way easier, allowing you to change this configuration when we show the graphs, like this:

# showing three graphs side by side in a row
show(row(plot1, plot2, plot3))

With the colors defined and the placement of the plots solved too, we can move on to our loop creation, with all the definitions above we came out with a loop like this:

# 1. Defining the color palette
colors = ['lightcoral', 'seagreen', 'mediumpurple', 'gold', 'orangered', 'teal', 'mediumvioletred']

# 2. Defining the currencies dictionary 
currencies = {'dollar': 'president_us', 
              'real': 'president_br'}

# 3. Defining a dictionary to store the plots
plots = {}

# 4. a. Looping to select the data for each currency
for currency, president in zip(currencies.keys(), currencies.values()):

    # 4. b. Creating a clean line plot, for each president, with the right year range in different colors;  
    for p, c in zip(euro[president].unique(), colors):
        
        # creating the figure to the plots
				# each figure has the size proportional as the amount of years the president stayed 
        width = ((euro.loc[euro[president] == p, 'time'].dt.year.unique().size) * 20) + 140
	        # defining that the x axis has datetime variables  in it
				f = figure(plot_width=width, plot_height=400)
        
        # filtering the data per president to plot the graph
        x = euro.loc[euro[president] == p, 'time']
        y = euro.loc[euro[president] == p, currency]
        
        # 4. c. Creating the upper case title for each president graph
        title = p.upper()
        
        # 4. d. Creating the subtitles text and calculating the right location
        first_year = euro.loc[euro[president] == p, 'time'].dt.year.unique()[0]
        last_year = euro.loc[euro[president] == p, 'time'].dt.year.unique()[-1]
        subtitle = "({}-{})".format(first_year, last_year)    
        sub_location = ((width/2) - 73)
            
        # Creating the lineplot
        f = clean_lineplot(x, y, color=c, fig=f,
                           title=title, 
                           sub=subtitle, sub_x=sub_location)
        
        # Storing the plot in the dictionary, with the president name as key
        plots[p] = f

But when we try to execute our loop and show the graphs side by side, this is what happens:

Looking at the graphs, we can see that there are several items that need our attention:

  • The grid lines are not in the same proportions in all the plots;
  • The x axis shows too many information in some of them (The TRUMP and BIDEN plots show month information);
  • There are no logical year ranges;
  • There isn’t the end year boundary in the x axis;
  • Biden subtitle has two repeated years;
  • The y axis are not in the same range;
  • The line could be thicker;
  • The yellow fades a little bit in the background;

b. Resolving the last details

As always, things did not flow as smoothly as we wanted to, but this is not a big problem, in this section we are going to solve them to check out the results.

Grid lines without proportions, too many x axis information and year ranges

The reason for our first three problems is that our dataset doesn’t have the data for every single day of the year, so some months and years have a little bit more information than others. This doesn’t make a big difference in the plot itself, because the amount of data we have is more than enough (we are also using a 30 day rolling mean value). But the plot creates its x axis in equal sections ignoring the year that information is from, so it splits the data and, sometimes, takes the label of the beginning of the year, the middle of the year or the end, keeping a gap between some years or duplicating others, here’s an example:

image

There’s also no end year boundary, the line ends without any reference at its tail. To change all of that, we are going to create a loop that takes the first day of each year of the data that is going to be used in each graph and stores it in a list, this list will be used as the ticker labels for the x axis for each plot, showing the beginning of each year and the first day of the next year to create the end boundary. The loop block will look like this:

# Loop to select the right tickers 
# list to collect the first days
first_days = []

# loop through each unique year
for y in euro.loc[euro[president] == p, 'time'].dt.year.unique():
    # creating a list to store the data for each year
    year_range = []
    # looping through the timestamps for each president
    for t in euro.loc[euro[president] == p, 'time']:
        # checking if the timestamp refers to the year we are searching the first day
        if t.year == y:
            # appending every timestamp of that day
            year_range.append(t)
    # appending the 'minimum' day of every timestamp from that year 
    first_day = min(year_range)
    first_days.append(first_day)

# adding a year to the last day in the first days, to create the last year boundary
last = pd.to_datetime(first_days[-1]) + pd.offsets.DateOffset(years=1)
# appending the end boundary
first_days.append(last)
# changing the type of the x axis tickersto 
tick_vals = pd.to_datetime(first_days).astype(int) / 10**6
# changing the tickers to the new ones
f.xaxis.ticker = FixedTicker(ticks=list(tick_vals))

Subtitle for presidents that stayed one year

To solve the subtitle problem, we can insert an if statement to check if the first year is the same as the last, if so, it will display a subtitle with only one year. The subtitle block will look like this:

# 2. d. Creating the subtitles text and calculating the right location
first_year = euro.loc[euro[president] == p, 'time'].dt.year.unique()[0]
last_year = euro.loc[euro[president] == p, 'time'].dt.year.unique()[-1]
# checking if the president stayed just one year
if first_year == last_year:
		# if so, insert just the year
    subtitle = "({})".format(first_year)
    sub_location = ((width/2) - 35)
else:
		# if not, insert both years
    subtitle = "({}-{})".format(first_year, last_year)    
    sub_location = ((width/2) - 73)

Standardizing the y axis range

This step is extremely important, because this can lead the reader to totally different conclusions, we can solve it very quickly, using the bokeh functions plot.y_range.start e plot.y_range.end the block will look like this:

f.y_range.start = min(euro[currency])
f.y_range.end = max(euro[currency])

Making the line thicker

This can be solved very fast, we just need to change the line_width in our clean_lineplot function, so the code block will change from this:

f.line(x, y, line_color=color, line_width=2)

To this:

f.line(x, y, line_color=color, line_width=4)

Removing the yellow from our color palette

It is going to look like this:

# 1. Defining the color palette
colors = ['lightcoral', 'seagreen', 'mediumpurple', 'orangered', 'teal', 'mediumvioletred']

After all the changes, we can plot the US plots and the Brazilian plots like this:

show(row(plots['Bush'], plots['Obama'], plots['Trump'], plots['Biden']))

show(row(plots['FHC'], plots['Lula'], plots['Dilma'], plots['Temer'], plots['Bolsonaro']))

Conclusion

After all the steps above, we were able to create a function that reproduces the line plots we need in the style we chose, as well as a loop to use the function and organize the information for any currency we choose.

Creating graphs with Bokeh gives us a wider range of possibilities, after knowing the principal features, we can start learning about other more complex and interesting functionalities that allow us to create intelligent labels, highlight different information or highly customize the plots.

Feel free to connect to me in LinkedIn, look into my repositories in GitHub or talk to me here, in the community!

I hope this brief tutorial gave you some interesting information that can help in your data science journey!

Until the next time and happy learning!

3 Likes

amazing work! loved it!

1 Like

Absolutely great job, Nathalia! :star_struck: :heavy_heart_exclamation: I really liked your step-by step approach and your style of writing: it is very clear, interesting, thorough, and easy-to follow! Bokeh library was something that I still hadn’t explored before (despite my love to visualizations :sweat_smile:), now I see that it definitely has a huge potential. Your interactive pictures are just super-cool! :heart_eyes_cat: Thanks a lot for sharing, looking forward to reading your new works here!

1 Like

I was unpretentiously reading an article and… suddenly you came up with a full tutorial. Great job Nathalia.

1 Like