An Introduction to Matplotlib, Seaborn and Plotly:

Chris Grannan
6 min readFeb 27, 2021

--

Visualizations are at the heart of data science. There is no clearer way to describe the statistics of a dataset or the results of a model than through a well organized graph. Luckily, python has many libraries that facilitate creating a well constructed visualization. In this post, I will be giving a brief overview of three libraries, Matplotlib, Seaborn and Plotly, and give some examples of the differences between them.

Matplotlib:

Matplotlib is the standard graphing library in python, and is typically the first graphing library a data scientist will learn when using python. It is functionally integrated with pandas and numpy for easy and efficient plotting. Furthermore, Matplotlib gives the user full control over fonts, graph styling and axes properties, though this control comes at the potential cost of lengthy blocks of code. Matplotlib is especially good for performing exploratory analysis because of the integration with pandas, allowing for quick transformations from dataframe to graph. Matplotlib is particularly good for creating basic plots like scatter plots, bargraphs and lineplots, but looks a little rough when creating more complex plots like polar scatterplots.

Seaborn:

Seaborn is a library built on top of the pyplot module in Matplotlib. It provides a high level interface to create a more intuitive feel. This entails using a simpler syntax and more intuitive parameter settings. Additionally, Seaborn includes a more aesthetically pleasing collection of colors, themes and styles. This produces a smoother and more professional looking plot than those created from the pyplot module. This library is especially useful when creating more complex plots where more refined graphics

Plotly:

Unlike Matplotlib and Seaborn, Plotly is used to make interactive charts. While the plots look very similar to those produced by Seaborn in terms of graphics, they have the added utility of displaying information when a user hovers their mouse over the chart. This effect is accomplished by utilizing JavaScript behind the scenes and is a particularly useful feature when looking at busy or complex charts as you are immediately able to select the information that you are interested in. The drawback to using charts in Plotly, is that the code can get a bit complex and quite long depending on the method being used.

Example:

Now that we have gone over what these libraries are used for, let’s look at an examples of how we can build plots in each. For this example, we will be using the titanic dataset and we want to create a barplot showing the average fare price for each passenger class when accounting for whether or not an individual survived (1 for survived, 0 for did not survive). Below is the Matplotlib code for creating this plot and the resulting image.

# Import pyplot module
import matplotlib.pyplot as plt
# Set default size for all pyplot plots
plt.rcParams["figure.figsize"] = (12,8)
# Group the dataframe by passenger class and survival,
# then calculate average fare price,
# then unstack the grouped data, and plot a barplot
df.groupby(
['pclass', 'survived']
)['fare'].mean().unstack().plot(kind='bar')
plt.show()

There is a good deal to unpack here. In order to get the information that we wanted, we needed to group the dataframe by passenger class and by survival. This gives us 6 groups, those who survived and those who didn’t for each passenger class. We choose our aggregate in the grouping clause to be the mean fare price. Next we unstack our grouped dataframe in order to plot 3 paired groupings, rather than 6 individual ones. Finally we choose to plot as a barplot. We can see how clunky Matplotlib can be here. One other thing to note here is that plotting function is called off of the pandas dataframe. We call the .plot() attribute of the dataframe object and do not need to feed the dataframe into a seperate function. Now let’s compare this to creating the same plot in Seaborn.

# import Seaborn
import seaborn as sns
# set global parameters like font and label sizes
sns.set_context('talk')
# set style parameters such as presence of grid and background color
sns.set_style('darkgrid')
# Plot data
sns.barplot(data=df, x= 'pclass', y = 'fare', hue='survived')
plt.show()

There are a few things to note here. First, we do not need to set the size of this chart. Because Seaborn is built on Matplotlib, we are using the default size that we have already set. Similarly, now that we have set the context and style of Seaborn, every Matplotlib chart will use the same values. Next, we can see how much simpler the syntax is for Seaborn compared to pyplot. We don’t need to group our data here, we just pass the values in and Seaborn calculates the results automatically. Finally, we also see that Seaborn will help make our chart a little nicer by filling in our axes labels. We can always supply a label by passing plt.xlabel() or plt.ylabel(), but this is not strictly necessary when using Seaborn. Now let’s look at how we can create the same plot using Plotly.

# Import Plotly
import plotly.express as px
# Set up graph by grouping the dataframe by pclass and survival.
# We need to keep our column names, so we set as_index to False.
fig = px.bar(df.groupby(['pclass', 'survived'],
as_index=False
).agg({'fare':'mean'}),
x='pclass',
y='fare',
color='survived',
# Setting 'barmode' to group creates paired plots
# instead of stacked plots.
barmode = 'group')
fig.show()

When the code for this plot is run, hovering a mouse over any of the bars will tell you the passenger class, the survival group and the average ticket price for that group. This plot requires the data to be grouped like the Matplotlib plot, but it is a little more complex since we need access to our column names. To account for this, we group with the .agg() function so that we can set ‘set_index’ to false inside of the group by function. The other major difference with this code is that the default nature of this graph is a stacked bar plot instead of a paired bar plot. To account for this, we simply set ‘barmode’ to ‘group’. This is a very basic example and doesn’t show off the full usefulness of Plotly. Mostly this example is to show the differences in creating a simple chart. It’s important to note however that Plotly lets you create some really advanced visualizations and makes busy visualizations very easy to read. I will likely write a post soon showing off some of the higher end visualizations that you can create using Plotly.

Summary:

To recap, we covered three graphing libraries in python today. Matplotlib is a pretty basic library with some key tools and lots of functionality. Seaborn builds on top of Matplotlib and creates more visually interesting plots. The syntax is a bit less clunky than using Matplotlib, but it can’t be run off a dataframe or series. Finally, Plotly is used to create interactive charts. The syntax is a little clunkier than either of the other two options and there is no less direct integration with pandas since the library relies on JavaScript. However the extra effort is worth it because the library allows very advanced graphics to be created.

Resources:

For some examples of graphics you can create using each of these three libraries, make sure to check out the documentation:

Matplotlib

Seaborn

Plotly

--

--

No responses yet