# # Simple Line Plots
# All plots is the visualization of a single function $y = f(x)$.
# Here we will take a first look at creating a simple plot of this type.
%matplotlib inline
import matplotlib.pyplot as plt
#plt.style.use('seaborn-whitegrid')
import numpy as np
Visualization
For all Matplotlib plots, we start by creating a figure and axes.
= plt.figure()
fig = plt.axes() ax
The figure* (an instance of the class plt.Figure
) can be thought of as a single container that contains all the objects representing axes, graphics, text, and labels.*
The axes* (an instance of the class plt.Axes
) is what we see above: a bounding box with ticks, grids, and labels.*
= plt.figure()
fig = plt.axes()
ax
= np.linspace(0, 10, 1000)
x ; ax.plot(x, np.sin(x))
Note that the semicolon at the end of the last line is intentional: it suppresses the textual representation of the plot from the output.
Alternatively, we can use the PyLab interface
; plt.plot(x, np.sin(x))
Adjusting the Plot: Line Colors and Styles
- 0), color='blue',linestyle='solid') # specify color by name
plt.plot(x, np.sin(x - 1), color='g',linestyle='dashed') # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75',linestyle='dashdot') # grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44',linestyle='dotted')# hex code (RRGGBB, 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3),linestyle='--') # RGB tuple, values 0 to 1
plt.plot(x, np.sin(x - 5), color='chartreuse',linestyle=':'); # HTML color names supported plt.plot(x, np.sin(x
Combining these linestyle
and color
codes into a single non-keyword argument
+ 0, '-g') # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r'); # dotted red plt.plot(x, x
RGB (Red/Green/Blue) and CMYK (Cyan/Magenta/Yellow/blacK) color systems, commonly used for digital color graphics.
Adjusting the Plot: Axes Limits
plt.plot(x, np.sin(x))
-1, 11)
plt.xlim(-1.5, 1.5); plt.ylim(
Automatically tighten the bounds around the current content, as shown in the following figure:
plt.plot(x, np.sin(x))'tight'); plt.axis(
Or you can specify that you want an equal axis ratio, such that one unit in x
is visually equivalent to one unit in y
, as seen in the following figure:
Labeling Plots
Labeling of plots: titles, axis labels, and simple legends.
plt.plot(x, np.sin(x))"A Sine Curve")
plt.title("x")
plt.xlabel("sin(x)"); plt.ylabel(
'-g', label='sin(x)')
plt.plot(x, np.sin(x),
'equal')
plt.axis(
; plt.legend()
Matplotlib Gotchas
While most plt
functions translate directly to ax
methods (plt.plot
→ ax.plot
, plt.legend
→ ax.legend
, etc.), this is not the case for all commands. In particular, functions to set limits, labels, and titles are slightly modified. For transitioning between MATLAB-style functions and object-oriented methods, make the following changes:
plt.xlabel
→ax.set_xlabel
plt.ylabel
→ax.set_ylabel
plt.xlim
→ax.set_xlim
plt.ylim
→ax.set_ylim
plt.title
→ax.set_title
Simple Scatter Plots
= np.linspace(0, 10, 30)
x = np.sin(x)
y
'o', color='black'); plt.plot(x, y,
'-ok'); plt.plot(x, y,
'-p', color='gray',
plt.plot(x, y, =15, linewidth=4,
markersize='white',
markerfacecolor='gray',
markeredgecolor=2)
markeredgewidth-1.2, 1.2); plt.ylim(
Scatter Plots with plt.scatter
='o'); plt.scatter(x, y, marker
Let’s show this by creating a random scatter plot with points of many colors and sizes.
= np.random.default_rng(0)
rng = rng.normal(size=100)
x = rng.normal(size=100)
y = rng.random(100)
colors = 1000 * rng.random(100)
sizes
=colors, s=sizes, alpha=0.3)
plt.scatter(x, y, c; # show color scale plt.colorbar()
In this way, the color and size of points can be used to convey information in the visualization
from sklearn.datasets import load_iris
= load_iris()
iris = iris.data.T
features
0], features[1], alpha=0.4,
plt.scatter(features[=100*features[3], c=iris.target, cmap='viridis')
s0])
plt.xlabel(iris.feature_names[1]); plt.ylabel(iris.feature_names[
Visualizing Uncertainties
For any scientific measurement, accurate accounting of uncertainties is nearly as important, if not more so, as accurate reporting of the number itself. For example, imagine that I am using some astrophysical observations to estimate the Hubble Constant, the local measurement of the expansion rate of the Universe. I know that the current literature suggests a value of around 70 (km/s)/Mpc, and I measure a value of 74 (km/s)/Mpc with my method. Are the values consistent? The only correct answer, given this information, is this: there is no way to know.
Suppose I augment this information with reported uncertainties: the current literature suggests a value of 70 ± 2.5 (km/s)/Mpc, and my method has measured a value of 74 ± 5 (km/s)/Mpc. Now are the values consistent? That is a question that can be quantitatively answered.
In visualization of data and results, showing these errors effectively can make a plot convey much more complete information.
Basic Errorbars
= np.linspace(0, 10, 50)
x = 0.8
dy = np.sin(x) + dy * np.random.randn(50)
y
=dy, fmt='.k'); plt.errorbar(x, y, yerr
The fmt
is a format code controlling the appearance of lines and points
=dy, fmt='o', color='black',
plt.errorbar(x, y, yerr='lightgray', elinewidth=3, capsize=0); ecolor
Continuous Errors
In some situations it is desirable to show errorbars on continuous quantities.
from sklearn.gaussian_process import GaussianProcessRegressor
# define the model and draw some data
= lambda x: x * np.sin(x)
model = np.array([1, 3, 5, 6, 8])
xdata = model(xdata)
ydata
# Compute the Gaussian process fit
= GaussianProcessRegressor()
gp
gp.fit(xdata[:, np.newaxis], ydata)
= np.linspace(0, 10, 1000)
xfit = gp.predict(xfit[:, np.newaxis], return_std=True)
yfit, dyfit
# Visualize the result
'or')
plt.plot(xdata, ydata, '-', color='gray')
plt.plot(xfit, yfit, - dyfit, yfit + dyfit,
plt.fill_between(xfit, yfit ='gray', alpha=0.2)
color0, 10); plt.xlim(
Density and Contour Plots
Sometimes it is useful to display three-dimensional data in two dimensions using contours or color-coded regions.
Our first example demonstrates a contour plot using a function \(z = f(x, y)\)
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
= np.linspace(0, 5, 50)
x = np.linspace(0, 5, 40)
y
= np.meshgrid(x, y)
X, Y #print(X,Y)
= f(X, Y)
Z ='black'); plt.contour(X, Y, Z, colors
20, cmap='RdGy'); plt.contour(X, Y, Z,
20, cmap='RdGy')
plt.contourf(X, Y, Z, ; plt.colorbar()
Histograms, Binnings, and Density
A simple histogram can be a great first step in understanding a dataset.
= np.random.default_rng(1701)
rng = rng.normal(size=1000)
data
data; plt.hist(data)
An example of a more customized histogram
=30, density=True, alpha=0.8,
plt.hist(data, bins='stepfilled', color='steelblue',
histtype='red');
edgecolor
= rng.normal(0, 0.8, 1000)
x1 = rng.normal(-2, 1, 1000)
x2 = rng.normal(3, 2, 1000)
x3
= dict(histtype='stepfilled', alpha=0.3, density=True, bins=40)
kwargs
**kwargs)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs); plt.hist(x3,
Two-Dimensional Histograms and Binnings
= [0, 0]
mean = [[1, 1], [1, 2]]
cov = rng.multivariate_normal(mean, cov, 10000).T x, y
=30)
plt.hist2d(x, y, bins= plt.colorbar()
cb 'counts in bin') cb.set_label(
plt.hexbin: Hexagonal binnings
The two-dimensional histogram creates a tesselation of squares across the axes.
=30)
plt.hexbin(x, y, gridsize= plt.colorbar(label='count in bin') cb
Kernel density estimation
Another common method for estimating and representing densities in multiple dimensions is kernel density estimation* (KDE).*
from scipy.stats import gaussian_kde
# fit an array of size [Ndim, Nsamples]
= np.vstack([x, y])
data = gaussian_kde(data)
kde
# evaluate on a regular grid
= np.linspace(-3.5, 3.5, 40)
xgrid = np.linspace(-6, 6, 40)
ygrid = np.meshgrid(xgrid, ygrid)
Xgrid, Ygrid = kde.evaluate(np.vstack([Xgrid.ravel(), Ygrid.ravel()]))
Z
# Plot the result as an image
plt.imshow(Z.reshape(Xgrid.shape),='lower', aspect='auto',
origin=[-3.5, 3.5, -6, 6])
extent= plt.colorbar()
cb "density") cb.set_label(
Customizing Plot Legends
= np.linspace(0, 10, 1000)
x = plt.subplots()
fig, ax '-b', label='Sine')
ax.plot(x, np.sin(x), '--r', label='Cosine')
ax.plot(x, np.cos(x), 'equal')
ax.axis(= ax.legend() leg
Legend for Size of Points
Sometimes the legend defaults are not sufficient for the given visualization. For example, perhaps you’re using the size of points to mark certain features of the data, and want to create a legend reflecting this.
# import pandas as pd
# cities = pd.read_csv('data/california_cities.csv')
# # Extract the data we're interested in
# lat, lon = cities['latd'], cities['longd']
# population, area = cities['population_total'], cities['area_total_km2']
# # Scatter the points, using size and color but no label
# plt.scatter(lon, lat, label=None,
# c=np.log10(population), cmap='viridis',
# s=area, linewidth=0, alpha=0.5)
# plt.axis('equal')
# plt.xlabel('longitude')
# plt.ylabel('latitude')
# plt.colorbar(label='log$_{10}$(population)')
# plt.clim(3, 7)
# # Here we create a legend:
# # we'll plot empty lists with the desired size and label
# for area in [100, 300, 500]:
# plt.scatter([], [], c='k', alpha=0.3, s=area,
# label=str(area) + ' km$^2$')
# plt.legend(scatterpoints=1, frameon=False, labelspacing=1, title='City Area')
# plt.title('California Cities: Area and Population');
Multiple Subplots
plt.axes: Subplots by Hand
we might create an inset axes at the top-right corner of another axes by setting the x and y position to 0.65 (that is, starting at 65% of the width and 65% of the height of the figure) and the x and y extents to 0.2 (that is, the size of the axes is 20% of the width and 20% of the height of the figure).
= plt.axes() # standard axes
ax1 = plt.axes([0.65, 0.65, 0.2, 0.2]) ax2
Create two vertically stacked axes.
= plt.figure()
fig = fig.add_axes([0.1, 0.5, 0.8, 0.4],
ax1 =[], ylim=(-1.2, 1.2))
xticklabels= fig.add_axes([0.1, 0.1, 0.8, 0.4],
ax2 =(-1.2, 1.2))
ylim
= np.linspace(0, 10)
x
ax1.plot(np.sin(x)); ax2.plot(np.cos(x))
plt.subplot: Simple Grids of Subplots
= plt.figure()
fig =0.4, wspace=0.4)
fig.subplots_adjust(hspacefor i in range(1, 7):
= fig.add_subplot(2, 3, i)
ax 0.5, 0.5, str((2, 3, i)),
ax.text(=18, ha='center') fontsize
plt.subplots: The Whole Grid in One Go
Let’s create a \(2 \times 3\) grid of subplots, where all axes in the same row share their y-axis scale, and all axes in the same column share their x-axis scale
= plt.subplots(2, 3, sharex='col', sharey='row') fig, ax
plt.GridSpec: More Complicated Arrangements
A GridSpec
for a grid of two rows and three columns with some specified width and height space.
= plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)
grid 0, 0])
plt.subplot(grid[0, 1:])
plt.subplot(grid[1, :2])
plt.subplot(grid[1, 2]); plt.subplot(grid[
# Create some normally distributed data
= [0, 0]
mean = [[1, 1], [1, 2]]
cov = np.random.default_rng(1701)
rng = rng.multivariate_normal(mean, cov, 3000).T
x, y
# Set up the axes with GridSpec
= plt.figure(figsize=(6, 6))
fig = plt.GridSpec(4, 4, hspace=0.2, wspace=0.2)
grid = fig.add_subplot(grid[:-1, 1:])
main_ax = fig.add_subplot(grid[:-1, 0], xticklabels=[], sharey=main_ax)
y_hist = fig.add_subplot(grid[-1, 1:], yticklabels=[], sharex=main_ax)
x_hist
# Scatter points on the main axes
'ok', markersize=3, alpha=0.2)
main_ax.plot(x, y,
# Histogram on the attached axes
40, histtype='stepfilled',
x_hist.hist(x, ='vertical', color='gray')
orientation
x_hist.invert_yaxis()
40, histtype='stepfilled',
y_hist.hist(y, ='horizontal', color='gray')
orientation y_hist.invert_xaxis()
Visualization with Seaborn
There are several complaints about Matplotlib that often come up:
- A common early complaint, which is now outdated: prior to version 2.0, Matplotlib’s color and style defaults were at times poor and looked dated.
- Matplotlib’s API is relatively low-level. Doing sophisticated statistical visualization is possible, but often requires a lot of boilerplate code.
- Matplotlib predated Pandas by more than a decade, and thus is not designed for use with Pandas
DataFrame
objects.
An answer to these problems is Seaborn. Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas.
import seaborn as sns
import pandas as pd
Histograms, KDE, and Densities
= np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
data
for col in 'xy':
=True, alpha=0.5) plt.hist(data[col], density
=data, shade=True); sns.kdeplot(data
/tmp/ipykernel_272/2427864141.py:1: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=data, shade=True);
If we pass x
and y
columns to kdeplot
, we instead get a two-dimensional visualization of the joint density
=data, x='x', y='y'); sns.kdeplot(data
Pair Plots
When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. These are very useful for exploring correlations between multidimensional data, when you’d like to plot all pairs of values against each other.
= sns.load_dataset("iris")
iris iris.head()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
='species', height=2.5); sns.pairplot(iris, hue