How I started with a data viz library not covered by DQ courses (middle of Data Analyst Path)


Last couple of weeks I´ve been having some problems with motivation with my studies of data science. In order to bring my cheers up I decided to remember my accomplishments since I started my journey in big data. Although it helped, my study routine was lost. So, I´ve thought that sharing one of the “success stories” might help me recover my routine and also could help someone who is passing through their low motivation cycle.

Thus I decided to share my story how I created from scratch some cool interactive charts using Plotly Express for my guided project on Star Wars Survey. (Spoiler: It turned out to be a relatively easy journey thanks to Plotly´s incredibly detailed and illustrative documentation.)

Decision taken

I had been thinking to try out some interactive data visualization tool for a while when I started to work on my guided project. Usually, before digging into the work I look through the already submitted projects to get some ideas. It´s when I saw @Raj´s project and finally I decided to give it a try with Plotly. Although his code seemed quite clear to me and it was possible just to adapt it for my project, I decided to take a slower but more professional way to get familiar with the official documentation and their tutorials first and then to start coding. :nerd_face:

Plotly Express and first chart created

So, I went to the Plotly for Python web page. My first discovery was that Raj populated his figures manually by passing its layout and data parameters and that for the beginners Plotly recommends using the high-level plotly.express module, also known as Plotly Express. And I followed their advice.

Then I looked briefly through the examples of basic charts to see what kinds of charts might be useful for the project and decided to use bar charts for all the ranking pictures and the sunburst chart to describe a portrait of the survey respondent.

It´s quite easy to plot a basic chart “as it is”.

#prepare the data for plotting

##the columns which refer to movie rankings
rank_cols = ['ranking_ep1', 'ranking_ep2', 'ranking_ep3', 
             'ranking_ep4', 'ranking_ep5', 'ranking_ep6']

ranking = star_wars[rank_cols].mean()

#plot
fig = px.bar(ranking, x=ranking.values, y=ranking.index, 
             orientation='h')

fig.show()

Pic. 1. Basic bar chart with no additional settings

There´s even some default hover text already!

Make it nice-looking and more informative

Usually, I choose to spend some time to improve aesthetics. In Plotly the figure consists of traces and the layout, so in order to change the aesthetics these are two methods we need: fig.update_traces() and fig.update_layout(). The best thing I love about them, that they are really intuitive. For example, you can customize the title specifying the text, the font size, weight, etc. only by passing to the fig.update_layout() a dict-like argument title:

#plot
fig = px.bar(ranking, x=ranking.values, y=ranking.index, 
             orientation='h')

#plot aesthtics
fig.update_layout(title={'text':'<b>Most favorite movie in the Star Wars franchise</b> <br>based on 835 respondents',  
                         'font':{'size':22}})

Pic. 2. Setting the title

So, where to learn what parameters can be changed and what attributes and keywords should be used?

First, Plotly fundamentals page has got many examples and tutorials. Especially I liked the following one: Introspecting Figures in Python which shows the anatomy of a Plotly figure and few ways to learn its attributes, including the default one. Here you´ll get to know how to change an attribute you´ve never heard of before :blush:

And then, of course, there´s the Reference page. After some quite time spent on these 2 pages, I learned to customize hover text, labels, ticks´ parameters and even draw manually a legend.

Let´s check step by step how to customize the most common parameters besides the title: the axes labels, the color, the tick formatting, and the hover labels.

1. The axis labels
Although it´s possible to pass the desired axes titles to the fig.udate_layout(), Plotly Express has got a more intuitive solution: we can customize the axes name by using the labels keyword argument when creating a chart:

fig = px.bar(ranking, x=ranking.values, y=ranking.index, 
             orientation='h', 
             labels={'x': 'average rank', 'index': 'Star Wars movie'})

2. The trace color
If we want just to change the color of the markers (not using the color as a hue parameter), we do it directly with fig.update_traces() method passing it to the marker_color keyword argument (it accepts either a color or a list of colors:

fig.update_traces(marker_color='rgb(137, 137, 137)')

3. Tick formatting
There´re various tick format related parameters that we can pass to the x- or yaxis in the fig.update_layout() method. First, I´d like to change the text and put instead of ‘ranking_ep*’ the episode titles. The easiest way is to rename in the dataframe (like I did in the project), but if for some reason we don´t want to modify the data frame, we can still modify the tick text by setting the tickmode to array, indicating the tick position with tickvals and finally setting the ticktext:

titles = ['Episode I The Phantom Menace', 
          'Episode II Attack of the Clones',
          'Episode III Revenge of the Sith',
          'Episode IV A New Hope',
          'Episode V The Empire Strikes Back',
          'Episode VI Return of the Jedi']  

fig.update_layout(yaxis={'tickmode': 'array',
                             'tickvals': [0, 1, 2, 3, 4, 5],
                             'ticktext': titles}

I also modified the font size of tick labels and had to change one more parameter. For my taste, the tick labels were too close to the yaxis, I couldn´t find how to change the position, but I added a couple of whitespaces as a suffix to each tick:

fig.update_layout(yaxis={'ticksuffix': '  ',
                         'tickfont':{'size':16}})

4. Modifying the hover labels
When I just started with Plotly, I fell in love with the hover labels. It´s a powerful tool that helps to reveal more information about a data point and at the same time keeps it from being overloaded with details.
It’s possible to choose which data you want to be present on the hover label and also to format its appearance and text. Plotly Express automatically adds all the data being plotted to the hover label. Most of the Plotly Express functions admit the hover_name and hover_data arguments using which you can specify the data to be shown. In order to format the text of the hover label, you can create a hovertemplate and pass it to the fig.update_traces() method:

fig.update_traces(hovertemplate='<i>Average rank:</i> %{x}')

And now let´s gather all the adjustments in the same plot:

fig = px.bar(ranking, x=ranking.values, y=ranking.index, 
             orientation='h', 
             labels ={'x': 'average rank', 'index': 'Star Wars movie'})

fig.update_layout(title={'text':'<b>Most favorite movie in the Star Wars franchise</b> <br>based on 835 respondents',  
                         'font':{'size':22}},
                  yaxis={'tickmode': 'array',
                         'tickvals': [0, 1, 2, 3, 4, 5],
                         'ticktext': titles})

fig.update_traces(marker_color='rgb(137, 137, 137)', 
                  hovertemplate='<i>Average rank:</i> %{x}')

fig.show()

Pic.3. Improving the aesthetics

More examples

Customized hover text with some additional data

#plot
fig = px.bar(seen, x='Number of views', y='Star Wars movie', 
             orientation='h', 
             custom_data=['Views_per', 'views_per_trilogy'], 
             category_orders={'Star Wars movie':titles})

#plot aesthetics
##color map highlighting only the most seen movie
colors=[] 
for val in seen['Number of views']:
    colors.append('rgb(252, 128, 14)' if val == seen['Number of views'].max() else 'rgb(137, 137, 137)')

fig.update_traces(hovertemplate='<i>Views:</i> %{x} <br><i>seen by %{customdata[0]:.0%} of respondents</i> <br> (the trilogy seen by %{customdata[1]:.0%} of respondents) ', 
                  marker_color=colors)

fig.update_layout(title={'text':'<b>Views recieved by each movie in the Star Wars franchise</b><br>based on 835 respondents',
                         'font':{'size':22}},
                  yaxis = {'ticksuffix': '  ',
                           'tickfont':{'size':16}})

fig.show()

Setting the colors and drawing the legend manually

#plot
fig = px.bar(ranks_sex, x='movie_rank', y='movie', orientation='h', 
             facet_col = 'Gender', facet_col_wrap=1, barmode='group',
             labels = {'movie_rank': 'average rank',
                       'movie': 'Star Wars movie'},
             category_orders={'movie':titles}
             )

#plot aesthetics
##color map highlighting the best ranked movie and other points of interest
colors_of_interest = list(colors)
colors_of_interest[0] = 'rgb(95, 158, 209)'

fig.update_traces(hovertemplate='<i>Average rank:</i> %{x}', 
                  marker_color = colors) 

#highlight the best ranked movie
fig.update_traces(row=0, marker_color = colors_of_interest) 
fig.update_layout(title={'text':'<b>Most favorite movie in the Star Wars franchise, by gender</b>',
                         'font':{'size':22}},
                  yaxes={ticksuffix='  ', 
                         tickfont={'size':14}})
fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1]))

##draw mannualy the legend 
###'best ranked' rectangle
fig.add_shape(type='rect',
    xref='paper', yref='paper',
    x0=3.5, x1=3.9, y0=4.6, y1=5.0,
    col=1, row=1,
    line={'width': 0.8,
          'color': 'rgb(252, 128, 14)' },
    fillcolor= 'rgb(252, 128, 14)')

###'best ranked' text
fig.add_annotation(text='Best ranked', xref='paper', yref='paper', x=4.3, y=4.8, align='left', 
                   col=1, row=1, showarrow=False)
###'of an interes' rectangle
fig.add_shape(type='rect',
    xref='paper', yref='paper',
    x0=3.5, x1=3.9, y0=3.9, y1=4.3,
    col=1, row=1,
    line={'width': 0.8,
          'color': 'rgb(95, 158, 209)'},
    fillcolor='rgb(95, 158, 209)')
###'of an interes' text
fig.add_annotation(text='  Of an interest', xref='paper', yref='paper', x=4.3, y=4.1, align='left', 
                   col=1, row=1, showarrow=False)

fig.show()

My first ever sunburst chart

#build a hierarcical table for the sunburst plot
distributions = star_wars.groupby(['sw_fan', 'Gender', 'Age']).size().reset_index()
distributions.rename({0: 'Number of respondents'}, axis=1, inplace=True)

#map the values to be used as labels on the plot
distributions['sw_fan'] = distributions['sw_fan'].map({True: 'Fans', False: 'Not fans'}) 

#plot
fig = px.sunburst(distributions, path=['sw_fan', 'Gender', 'Age'], 
                  values='Number of respondents', 
                  height=700, template='seaborn')

#plot aesthetics
fig.update_traces(hovertemplate='%{percentParent:.2%} of %{currentPath}')
fig.update_layout(title={'text':'<b>Distribution of respondents in various categories</b><br>Fanship, gender and age distribution',
                         'font':{'size':21}})

fig.show()

Happy ending

So, that was my story of how my relationship with Plotly started. I´m sure it´s going to be a long and a fruitful one. And I hope that this story will encourage you to start using new tools that are not covered in the courses even if you are only in the middle of your DQ path. You don´t have to be an expert for it, just some dedication and curiosity will be enough!

21 Likes

Amazing data viz!

I’ve recently discovered Plotly and loved it. Now I am even more impressed by what we may accomplish using this library. I will definitely study your code to learn more about this library. The sunburst chart is amazing.

Congrats and I hope you get back on track and recover your study routine!

2 Likes

Thank you! Same to me, when I started with Plotly I couldn’t imagine the opportunities it gives for someone who is still quite ammature, and they are enormous. The documentation and examples are easy to follow, and their community is also quite helpful.

2 Likes

Awesome job! I hardly understand anything but it looks impressive!

1 Like