import matplotlib.pyplot as plt
import numpy as np5 Graphing Data
Graphs are fundamental tools for inspecting and communicating with data.
We begin by importing the pyplot plotting library.
Given data in lists or arrays, we can simply plot that data!
x = [0, 1, 2, 3, 4, 5, 6]
y = [4, 8, 6, 9, 7, 6, 8]
plt.plot(x,y)
5.1 Labels and Limits
Plotting is not complete unless your plot is properly labeled. The table below contains the functions for basic labeling.
| Description | Python |
|---|---|
| Label \(x\) axis | plt.xlabel('X') |
| Label \(y\) axis | plt.ylabel('Y') |
| Title a graph | plt.title('My Plot') |
| Plot and label | plt.plot(x,y,label='series1') |
| Turn on legend | plt.xlabel('X') |
| Turn on grid | plt.grid() |
A common sequence of commands for plotting data may look like this:
# Data
year = [1800, 1830, 1880, 1900, 1920, 1990, 2000]
Boston = [24937, 61392, 362839, 560862, 748060, 874283, 589141]
NewYork = [60000, 185000, 1919000, 3802000, 7798000, 16044000, 17800000]
plt.plot(year, Boston, label='Boston')
plt.plot(year, NewYork, label='New York')
# Display with labels
plt.xlabel('Year')
plt.ylabel('Population')
plt.title('City Populations')
plt.grid()
plt.legend()
plt.show() 
5.2 Graph Types
Below are five scenarios that demonstrate five different types of useful Graphs. In addition to the graph, axes are labeled and figures are titled.
Line Plot
The default plot in pyplot is a line chart. Consider the following data representing time and distance traveled during a road trip.
| \(t\) | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| \(d\) | 0 | 42 | 101 | 132 | 138 | 190 | 218 |
t = [ 0, 1, 2, 3, 4, 5, 6] # Time in hours
d = [ 0, 42, 101, 132, 138, 190, 218] # Distance in miles
plt.plot(t,d)
plt.xlabel('Time (hours)')
plt.ylabel('Distance (miles)')
plt.title('Road Trip Progress')
plt.show()
Scatter Plot
The following data represents the weight \(w\) and price \(p\) for a set of cars.
| \(w\) | 2840 | 2900 | 3000 | 2950 | 3150 | 3200 | 3700 | 3900 | 3650 | 3950 |
|---|---|---|---|---|---|---|---|---|---|---|
| \(p\) | 22000 | 23500 | 21000 | 21500 | 26000 | 2700 | 31000 | 34000 | 39000 | 44000 |
A scatter plot is useful for inspecting and visualizing relationships between variables.
# Weight in pounds, price in dollars
w = [ 2840, 2900, 3000, 2950, 3150, 3200, 3700, 3900, 3650, 3950]
p = [ 22000, 23500, 21000, 21500, 26000, 27000, 31000, 34000, 39000, 44000]
plt.scatter(w,p)
plt.xlabel('Weight (lb)')
plt.ylabel('Price (dollars)')
plt.title('Price per weight of common vehicles')
plt.show() 
Bar Chart
A bar chart is useful for comparing different categories. Consider the results of a student survey on study methods:
study_methods = ["Flashcards", "ChatBot", "YouTube", "Group Study", "Notes"]
counts = [3, 8, 5, 19, 8]
plt.bar(study_methods, counts)
plt.xlabel("Study Method")
plt.ylabel("Number of Students")
plt.title("Preferred Study Methods")
plt.show()
Histogram
A histogram can give you an immediate sense of the distribution of values. Consider the quiz scores in the following example:
scores = [ 2, 3, 4, 4, 5, 5, 5, 6, 6, 7,
7, 7, 8, 8, 8, 8, 9, 9, 10, 10 ]
plt.hist(scores, bins=9, edgecolor='black')
plt.xlabel('Score out of 10')
plt.title('Quiz Score Distribution')
plt.show()
We used bins=9 to set the number of histogram bins to 9, and edgecolor='black' to help the bins stand out visually.
We frequently want to compare distributions by plotting them in the same figure. Here is an example with quiz scores from two classes:
class_A = [5, 6, 5, 7, 6, 5, 8, 6, 7, 5, 4, 6, 7, 5, 6]
class_B = [9, 8, 10, 9, 10, 8, 9, 10, 6, 8, 9, 7, 8, 10, 9]
plt.hist(class_A, alpha=0.7, bins=9, label='Class A')
plt.hist(class_B, alpha=0.7, bins=9, label='Class B')
plt.legend()
plt.xlabel('Score out of 10')
plt.title('Quiz Score Distributions by Class')
plt.show()
alpha=0.7 sets the transparency for each histogram in the figure. An alpha value of 0 would be invisible while 1 would be completely opaque.
Box Plot
A box plot is a different view of the same type of data we saw with a histogram, and is also useful for comparing distributions.
class_A = [5, 6, 5, 7, 6, 5, 8, 6, 7, 5, 4, 6, 7, 5, 6]
class_B = [9, 8, 10, 9, 10, 8, 9, 10, 6, 8, 9, 7, 8, 10, 9]
plt.boxplot(
[class_A, class_B],
labels=["Class A", "Class B"]
)
plt.title('Quiz Score Distributions by Class')
plt.show()
It is not required that we break up the boxplot command onto different lines. It simply helps with readability. Additional categories can be added by extending the lists of data and tick_labels.
5.2.1 Plot Summary
- Line Plot – Shows how a value changes over time or across an ordered sequence. Points are connected with lines to highlight trends.
- Scatter Plot – Displays the relationship between two numerical variables using individual points. It helps reveal patterns, trends, or correlations.
- Bar Chart – Compares values across categories. The height of each bar represents the size or value of a category.
- Histogram – Shows the distribution of numerical data by grouping values into bins. It helps to see the overall shape of the distribution and where data are concentrated.
- Box Plot – Summarizes a dataset using the median, quartiles, and possible outliers. It is useful for comparing distributions and understanding spread and variability.
5.3 Style
There is a tremendous amount of customization that can be done with PyPlot graphs. Two important style customizations involve the marker and line. The marker is the point that is plotted, and the line is the connection between points. See the PyPlot marker style reference and line style reference.
5.3.1 Marker
The most important customizations is the marker, but we can additionally set the colors and size:
x = [1, 2, 3, 4]
y = [2, 4, 3, 5]
plt.plot( x, y,
marker='o',
markeredgecolor='red',
markerfacecolor='orange',
markersize=10
)
plt.show()
Some additional marker types include:
| Marker Type | Marker Code | Appearance |
|---|---|---|
| Point | '.' |
· |
| Plus | '+' |
+ |
| X | 'x' |
× |
| Circle | 'o' |
○ |
| Square | 's' |
□ |
| Pentagon | 'p' |
⬠ |
| Hexagon | 'h' |
⬡ |
| Diamond | 'D' |
◇ |
| None | '' |
(no mark) |
5.3.2 Line
The main line customizations include which type of line, and what color to draw it with.
x = [1, 2, 3, 4]
y = [2, 4, 3, 5]
plt.plot( x, y,
linestyle='-.',
color='purple'
)
plt.show()
Some additional line styles include:
| Line Type | Line Code | Appearance |
|---|---|---|
| Solid | '-' |
───── |
| Dashed | '--' |
─ ─ ─ ─ |
| Dashdot | '-.' |
─ · ─ · |
| Dotted | ':' |
· · · · |
| None | '' |
(no line) |
5.3.3 Marker and Line Shortcuts
It is best to write out the explicit attributes you are customizing, but there are some nice shortcuts for convenience. Plot red point markers with dotted line using:
x = [1, 2, 3, 4]
y = [2, 4, 3, 5]
plt.plot(x, y, 'r.:') # Red, point marker, dotted line
plt.show()
plt.plot(x, y, 'ks--') # Black, square marker, dashed line
plt.show()

The matplotlib documentation offers several nice cheat-sheets that can be printed.
5.4 Graphing Arrays
Given a 2-D array of data, PyPlot treates each column as a series.
5.5 Comparing Populations
Exercises
Graph the following data set as a scatter plot. On the same plot, draw an approximate best fit line to the data (this is an estimate).
x 0 1 2 3 4 5 6 7 8 9 y 12.5 13.1 13.2 13.2 13.8 14.1 15 14.8 14.9 15.1 Recreate the following graphs as closely as you can:


- Perform a reflex test (or use previous data). Have one person place their hand at the edge of a table with their fingers out. A second person holds a ruler just above the first person’s finger tips and lets go at a random time. Without moving their hand, the first person closes their fingers to stop the ruler. For each person,
- Collect 12 samples of how far the ruler falls before it is caught.
- Put the data into a NumPy array.
- Create a histogram for each person depicting the distribution of their reflex times.
- Create a box plot for each person depicting the distribution of their reflex times.
- Consider a pendulum with mass \(m\), length \(L\), and release angle \(\theta\). The period \(T\) of a pendulum is the amount of time it takes to complete one cycle of motion, swinging from one side to the other and then back again. To model the period of a pendulum, we want to determine the effect that each of the variables \(m\), \(L\), and \(\theta\) have on \(T\). We can explore this experimentally by keeping two of the variables constant while changing the third and looking for changes in \(T\).
