Hello, Pyplot!

Altair is not the only Python library that we can use to visualize data.

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations. Within Matplotlib is Pyplot or matplotlib.pyplot, a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure, such as creating a figure and its plotting area, plotting some lines, decorating the plot with labels, etc.

In this lecture note, we will demonstrate very briefly what pyplot can do to show how it is different from Altair.

Interactivity, animation, mathematical operations, etc. and other things you can do in Matplotlib will not be discussed in INF100. However, you may find this Python library incredibly useful for scientific publication, data science work, and more, so feel free to explore a more detailed tutorial of Pyplot in your own time!

matplotlib figures cr: Adrien F Vincent
Image credit: Wikimedia - Adrien F Vincent


Import Pyplot

To begin, we need to import the libraries needed to use their functions: pandas for dataframes and matplotlib.pyplot for visualization.

Note: First you will need to pip install pandas and matplotlib in the terminal

import pandas as pd
import matplotlib.pyplot as plt
Load Dataframe

As in the Data Analysis lecture notes, we load in our dataset as a dataframe using a built-in function from pandas: read_csv(). Let’s read in the Hawks dataset from that lecture!

hawks = pd.read_csv('http://raw.githubusercontent.com/vincentarelbundock/Rdatasets/refs/heads/master/csv/Stat2Data/Hawks.csv', index_col=0)
hawks.head()
preview of hawks dataframe

Plotting

Generating visualizations with pyplot is very quick. We run a pyplot method called plot() which plots y-values against x-values as lines and/or markers. This method takes in the data attributes you want to plot e.g., the Wing and Weight columns in the hawks dataframe.

Then, we use a pyplot method called show() to display the chart.

plt.plot(hawks.Wing, hawks.Weight) # plot(x,y)
plt.show() 
line chart

So…by default, plot() will generate a blue line chart. In our case, a line chart is a rather chaotic way to visualize the relationship between Wing and Weight, which are two quantitative/continuous variables. Note: a line chart is more suitable for showing trends between 1 quantitative variable and 1 ordinal (ordered categorical) variable, such as income over years.

A scatterplot is a better way to go about showing the trends between two quantitative variables. We can turn the line chart into a scatterplot by adding more arguments into plot() to customize the visualization’s marks and channels. For example, 'o' tells pyplot to use circle marks. And adding a “g” in 'go' makes these circles green.

plt.plot(hawks.Wing, hawks.Weight, 'go')
plt.show() 
scatterplot with plot method

We can customize this chart even more, say adding x-axis and y-axis labels to the chart! And of course, we can’t forget a title!

plt.plot(hawks.Wing, hawks.Weight, 'go')
plt.ylabel("Weight")
plt.xlabel("Wing length")
plt.title("Relationship between a hawk's weight and wingspan")
plt.show()
scatterplot with labels and title

Instead of customizing a chart manually, we can also plot a variety of chart types like scatterplots, bar charts, correlations, and histograms using methods more specific than plot(). For example, scatter(), bar(), xcorr() and hist().

We can do more than plot just a single chart too. If we want to combine multiple charts into one figure, we can first create a layout using figure(). Then, we use subplot() to position each individual chart in the figure. For example:

plt.figure(figsize=(9,3)) # Create a 9x3 inch figure

plt.subplot(1, 2, 1) # The figure will have 1 row and 2 columns. Place scatterplot 1st from the left.
plt.scatter(hawks.Wing, hawks.Weight)

plt.subplot(1, 2, 2) # Place histogram 2nd from the left.
plt.hist(hawks.Wing)

plt.show()
multi-chart figure