NumPy

Scientific Computing

Author

Prof. Calvin

Why NumPy?

What is “NumPy”?

The fundamental package for scientific computing with Python.

Numerical Python
It is a package - addition features we can optional add to the Python language.

On NumPy

Basically, to do scientific computing it would be nice to have:

a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities

(These are the stated features of NumPy.)

Relevance

We have essentially been using 1- and 2-dimensional arrays already!
Here is the examples intercepts.py solution from last time.
- It will be cut off but you can copy/paste if you need it.

Code

taxes = [
    [9275, .10],
    [37650, .15],
    [91150, .25],
    [190150, .28],
    [413350, .33],
    [415051, .35]
]

cost = 0
start = 0
for tax in taxes:
    cost += (tax[0] - start) * tax[1]
    tax += [cost - tax[0] * tax[1]]
    start = tax[0]

We will

Change our existing lists, which we understand as “N-dimensional arrays” into NumPy arrays.
Show the benefits of this arrangement.
Show other NumPy features.

Package Management

PyPI

The Python Package Index (PyPI) is a repository of software for the Python programming language.

NumPy is a Python package.
- It is installed separately from Python, but
- May be installed using Python-based tools.

pip

The most popular tool for installing Python packages, and the one included with modern versions of Python.

pip is a command line utility
For years pip was the only real options for installing Python programs, but has experienced limitations as the package ecosystem has grown quite large.

Using `pip`

Use as an argument to python or python3

python3 -m pip install numpy

It may take a moment to install.

Verify Install

The following verifies the the NumPy install was successful
We see the return of import
- Introduce as, used to shorten names
- We could have done import pw as piecewise

import numpy as np

print(np.__version__)

2.0.1

Aside: Dunder

Around “version” there are two underscores, called “dunder” for double underscore.
These “dunder” values are special built-in Python values with a specific meaning.
You can see another by just printing np

print(np)

<module 'numpy' from 'C:\\Users\\cd-desk\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\numpy\\__init__.py'>

Recall

You may wish to review import
Consult Neovim -> `import’

Arrays

NumPy Arrays

The first thing to do in NumPy is make an array.
In general we use NumPy arrays when:
- We are dealing with lists of numbers
- We care about performance, or
- We want to use advanced mathematical operations

List to Array

We define a list of lists:

taxes = [
    [9275, .10],
    [37650, .15],
    [91150, .25],
    [190150, .28],
    [413350, .33],
    [415051, .35]
]

Lists of lists

It is worth while to examine taxes a bit

taxes

[[9275, 0.1],
 [37650, 0.15],
 [91150, 0.25],
 [190150, 0.28],
 [413350, 0.33],
 [415051, 0.35]]

Check Types

We can verify it is a list.

type(taxes)

list

We can verify it’s initial element is also a list.

type(taxes[0])

list

Since taxes[0] is a list, we can look at that list’s initial element.

type(taxes[0][0])

int

Metaphor

Perhaps we regard 0-indexed element of an array as the house on the corner of a block.
Perhaps we regard the 0-indexed element of that “house” as the ground floor.

taxes[0][0] # House on the corner, ground floor

Versus arrays

We have a list of lists of ints (like integers, round numbers).
Or do we?

type(taxes[0][1])

float

In fact, the tax rates are not round numbers, so they are “floats”.
NumPy will help us manage when we use “floating point numbers” (have a decimal point) and integers (don’t).

Niceties

Each of the internal lists of numbers is of the same length.
That lets us do this:

Cutoff	Rate
9275	.10
37650	.15
91150	.25
190150	.28
413350	.33
415051	.35

Contingencies

This… isn’t always true.
In our case, for example, we have a .396 rate with no cutoff.
We just can’t express this as an array:

Cutoff	Rate
413350	.33
415051	.35
	.396

Takeaways:

Be ready to deal with things being almost arrays, but ultimately only being lists-of-lists.
There’s ways to deal with this (we’ve seen a few, sneakily)

Arrays

In computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data. For instance, if each element of the data were a number, we might visualize a “one-dimensional” array like a list:

\[ \begin{array}{|c||c|c|c|} \hline 9275 & 37650 & 91150 & 190150 \\ \hline \end{array} \]

Tables

A two-dimensional array would be like a table:

\[ \begin{array}{|c||c|c|c|} \hline 9275 & 37650 & 91150 & 190150 \\ \hline .10 & .15 & .25 & .28 \\ \hline 0 & -463.75 & -6963.25 & -16470.75 \\ \hline \end{array} \]

ndarray

A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages. In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called ndarray: it represents an “N-dimensional array”.

The most obvious 3D example would be that taxes part of an array of tax policies

["Single", "Married, joint", "Married, separate", "Head of Household"]

Metaphor

Perhaps we have blocks in a city.
Houses on a block.
Stories on a house or floors in apartment.

Making Arrays

Make an array with np.array()
- Or numpy.array if you used import numpy
They look like this:

arr = np.array(taxes)
arr

array([[9.27500e+03, 1.00000e-01],
       [3.76500e+04, 1.50000e-01],
       [9.11500e+04, 2.50000e-01],
       [1.90150e+05, 2.80000e-01],
       [4.13350e+05, 3.30000e-01],
       [4.15051e+05, 3.50000e-01]])

Our integers are gone - everything in scientific notation

Aside: Scientific Notation

Scientific notation is a way of expressing numbers that are too large or too small to be conveniently written in decimal form, since to do so would require writing out an inconveniently long string of digits.

In scientific notation, nonzero numbers are written in the form

\[a \times 10^b\]

Aside: Explanation

In scientific notation, nonzero numbers are written in the form

\[a \times 10^b\]

$a$ (the coefficient or mantissa) is a number greater than or equal to 1 and less than 10 ($1 \le |a| < 10$).
$10$ is the base.
$b$ (the exponent) is an integer.

Aside: Physical Examples

Speed of light: The speed of light in a vacuum is approximately $300,000,000 \text{ m/s}$ \[ 3 \times 10^8 \text{ m/s} \]
Mass of an electron: The mass of an electron is approximately $0.00000000000000000000000000091093837 \text{ g}$. \[ 9.1093837 \times 10^{-28} \text{ g} \]

Aside: Economic Examples

We can use social science numbers.
Labor Market Outcomes of College Graduates by Major
Computer Science majors in 2025 have a $80,000 median wage
- $8.0000 \times 10^4$
And 6.1% unemployment
- $6.1 \times 10^{-2}$

Using Arrays

Inspecting Arrays

Given some array, we can look up elements in an array as we did with lists.
We refer to zero as the “index” of the initial element of an array (or list).

taxes[0]

[9275, 0.1]

arr[0]

array([9.275e+03, 1.000e-01])

We look up the same element by the same index in both Python lists and NumPy arrays.

Slices

Python and NumPy support slicing
This takes multiple elements of an array by specifying a range of indices
Let’s make a one-dimensional array to make matters simpler.

colors = ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]

color_lst = colors
color_lst[1:4]

['orange', 'yellow', 'green']

color_arr = np.array(colors)
color_arr[1:4]

array(['orange', 'yellow', 'green'], dtype='<U6')

dtype is data type - We’ll cover it soon.

Understanding slices

The slice 1:4 takes all elements at index beginning at 1 and stopping before getting to index 4.

\[ \small \begin{array}{|c|c|c|c|c|c|} \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline red & orange& yellow& green& blue& indigo& violet\\ \hline \end{array} \]

\[ \small \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline orange& yellow& green\\ \hline \end{array} \]

Omiting Values

If we omit the value before the : from the slice, it is treated as if a zero was provided.

color_arr

array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')

color_arr[0:4]

array(['red', 'orange', 'yellow', 'green'], dtype='<U6')

color_arr[:4]

array(['red', 'orange', 'yellow', 'green'], dtype='<U6')

Omiting End

If we omit the value after the : from the slice, it is as if the length was provided.

color_arr

array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')

len gives the length of a list or array

len(color_arr)

color_arr[4:7]

array(['blue', 'indigo', 'violet'], dtype='<U6')

color_arr[4:]

array(['blue', 'indigo', 'violet'], dtype='<U6')

Steps

Python and NumPy slices have an additional feature I find quite nice called “steps”
We specify a slice:

color_arr[1:4]

array(['orange', 'yellow', 'green'], dtype='<U6')

We add a third value

color_arr[1:4:1]

array(['orange', 'yellow', 'green'], dtype='<U6')

Step Size

We add a third value

color_arr[1:4:1]

array(['orange', 'yellow', 'green'], dtype='<U6')

This value determines how far to move over the original array between each element shown.
- Let’s look at 2 - every other element.

color_arr[1:4:2]

array(['orange', 'green'], dtype='<U6')

Example

In kindergarten, I learned that red, yellow, and blue were primary colors.
- Less sure now but a good ex.

color_arr

array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')

And the primaries:

color_arr[0:5:2]

array(['red', 'yellow', 'blue'], dtype='<U6')

Step Omissions

We can omit start, stop, or both and still take steps.

color_arr[::3]

array(['red', 'green', 'violet'], dtype='<U6')

We can use a negative step to reverse.

color_arr[::-1]

array(['violet', 'indigo', 'blue', 'green', 'yellow', 'orange', 'red'],
      dtype='<U6')

Aside: Negatives

Negative starts and stops can also be used

color_arr

array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')

They simply measure distance from the end.

color_arr[:-2]

array(['red', 'orange', 'yellow', 'green', 'blue'], dtype='<U6')

Coordinates

Versus Python lists-of-lists.

taxes[1][1]

0.15

NumPy arrays can specify coordinates - indices of multiple dimensions - in a single set of brackets.

arr[1][1], arr[1, 1]

(np.float64(0.15), np.float64(0.15))

Coordinate slices

I recommend using the comma notation.
Otherwise I get unexpected behavior.

# reverse in both dimensions
arr[::-1][::-1] # Just reverses twice

array([[9.27500e+03, 1.00000e-01],
       [3.76500e+04, 1.50000e-01],
       [9.11500e+04, 2.50000e-01],
       [1.90150e+05, 2.80000e-01],
       [4.13350e+05, 3.30000e-01],
       [4.15051e+05, 3.50000e-01]])

# reverse in both dimensions
arr[::-1,::-1] # It works!

array([[3.50000e-01, 4.15051e+05],
       [3.30000e-01, 4.13350e+05],
       [2.80000e-01, 1.90150e+05],
       [2.50000e-01, 9.11500e+04],
       [1.50000e-01, 3.76500e+04],
       [1.00000e-01, 9.27500e+03]])

Takeaway: Always use [x,y] instead of [x][y]
Things like this are we use NumPy!

Example

I want all the tax rates
- Metaphor: 1st floor of every apartment on the block.

# Look at the whole block, take address 1
arr[:][1] # Gives house address one

array([3.765e+04, 1.500e-01])

# Look at the whole block, take address 1
arr[:, 1] # Gives all first floors on the block

array([0.1 , 0.15, 0.25, 0.28, 0.33, 0.35])

Updates

We can also update entries via =

color_arr

array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')

Use (1) name of array, (2) index of element, (3) =, (4) new element

color_arr[-1] = 'purple'

The array now has that element/index.

color_arr

array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'purple'],
      dtype='<U6')

Vectorization

Why NumPy?

NumPy vectors can do a very cool thing that lists can’t.
We use it a lot in science.
Vector operations.

Vector Ops

Let’s compare Python

l1 = [1,2,3]
l2 = [1,1,1]
l1 + l2

[1, 2, 3, 1, 1, 1]

To NumPy

a1 = np.array(l1)
a2 = np.array(l2)
a1 + a2

array([2, 3, 4])

By the way, vectorization is really fast.
Our first “high performance computing” idea.

Example

Suppose you need to convert some temperatures.

# Portland highs and lows 5/26/25-6/2/25 in degF
temps = np.array([70, 50, 81, 60, 85, 57, 70, 52, 84, 58, 87, 58, 74, 53, 70, 54])
temps -= 32
temps *= 5
temps //= 9
temps

array([21, 10, 27, 15, 29, 13, 21, 11, 28, 14, 30, 14, 23, 11, 21, 12])

Aside: `//`

NumPy requires arrays to be of a certain kind of number.
30 is an integer.
30/9 is a decimal value…
Python furnishes special // integer division.
- It truncates (does not round) the result.

np.array([1,2,3,4,5,6,7,8,9,10]) // 3

array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3])

Try using / there. What happens?

Use case

Slices and vectorization help with income tax.
We:
- Had income cutoffs, that were
- The beginning of some tax brackets, but
- The end of other tax brackets
- Offset by one (1)

Building Brackets

Let’s refresh on what the tax bracket array looked like.

arr

array([[9.27500e+03, 1.00000e-01],
       [3.76500e+04, 1.50000e-01],
       [9.11500e+04, 2.50000e-01],
       [1.90150e+05, 2.80000e-01],
       [4.13350e+05, 3.30000e-01],
       [4.15051e+05, 3.50000e-01]])

Initial tax bracket goes zero and to 9275
The next goes from 9275 to 37650.
So 9275 is useful to two brackets.

Begin and end

Let’s grab just the cutoffs.

cutoffs = arr[:,0]
cutoffs

array([  9275.,  37650.,  91150., 190150., 413350., 415051.])

[:, 0] means for every row (slice :) take the initial columns (index 0)
- Take the ground floor of every house on the block.

Insert

The initial bracket begins at 0 and the last bracket ends at infinity.
- NumPy knows about infinity! np.inf
We can use NumPy insert to add an element at an index:

# Insert to (a) an array at (b) some index (c) some value
a = cutoffs
b = len(cutoffs)
c = np.inf
cutoffs = np.insert(a, b, c)
cutoffs

array([  9275.,  37650.,  91150., 190150., 413350., 415051.,     inf])

Append

I usually don’t insert, I append.
This allows adding two arrays together, like how Python + works on lists.
- Remember, NumPy can treat a value like 0 as a 0-D (zero dimensional) array.
- [0] would also work.

# Smoosh arrays, including scalars, together.
cutoffs = np.append(0,cutoffs)
cutoffs

array([     0.,   9275.,  37650.,  91150., 190150., 413350., 415051.,
           inf])

Slicing

Now we can take the beginning of every bracket, by index:

begin = cutoffs[:-1]
begin

array([     0.,   9275.,  37650.,  91150., 190150., 413350., 415051.])

And the end

end = cutoffs[1:]
end

array([  9275.,  37650.,  91150., 190150., 413350., 415051.,     inf])

Vector Minus

We can see how big each bracket is.

bracket_size = end - begin
bracket_size

array([  9275.,  28375.,  53500.,  99000., 223200.,   1701.,     inf])

Vector Times

We can see how much tax is spent in each bracket.
- We will “cut off” the last bracket first
- It doesn’t really have a size?

bracket_cost = bracket_size[:-1] * arr[:, 1]
bracket_cost

array([  927.5 ,  4256.25, 13375.  , 27720.  , 73656.  ,   595.35])

What happens if we don’t cut off the last bracket and try to multiply a vector of length 7 by a vector of length 6?

Accumulation

While bracket_cost does correctly describe the cost within on bracket, someone in the n+1’th bracket pays the cost the previous n brackets.
Someone how we want to sum those up.
NumPy has many built-in array functions, including np.cumsum.

total_cost = np.cumsum(bracket_cost)
total_cost

array([   927.5 ,   5183.75,  18558.75,  46278.75, 119934.75, 120530.1 ])

Aside: Mean

Other than np.cumsum, there are other functions over arrays we often use.
It isn’t particularly useful here, but np.mean is quite common:

np.mean(total_cost)

np.float64(51902.26666666666)

Takeaway: Some functions convert arrays back to values.

Takeaways

NumPy does a lot!

Arrays mostly, but including:
Vector operations called “broadcasting”.
Indices and slices matter a lot.
Contants like np.inf support mathematics
Accumulation functions we don’t have to write ourselves (which we’d have to do via loops)

On Names

You might hear of a 0-D (zero-dimensional) array referred to as a “scalar”, a 1-D (one-dimensional) array as a “vector”, a 2-D (two-dimensional) array as a “matrix”, or an N-D (N-dimensional, where “N” is typically an integer greater than 2) array as a “tensor”.

For clarity, it is best to avoid the mathematical terms when referring to an array because the mathematical objects with these names behave differently than arrays (e.g. “matrix” multiplication is fundamentally different from “array” multiplication), and there are other objects in the scientific Python ecosystem that have these names (e.g. the fundamental data structure of PyTorch is the “tensor”).

Exercise

Using NumPy

Today we successfully recomputed the points of the point-intercept form of the income tax problem.
Using NumPy, starting with taxes, create the array on the right:

taxes = np.array([
    [9275, .1],
    [37650, .15],
    [91150, .25],
    [190150, .28],
    [413350, .33],
    [415051, .35]
])

np.array([[ 9.275000e+03,  1.000000e-01,  0.000000e+00],
          [ 3.765000e+04,  1.500000e-01, -4.637500e+02],
          [ 9.115000e+04,  2.500000e-01, -4.228750e+03],
          [ 1.901500e+05,  2.800000e-01, -6.963250e+03],
          [ 4.133500e+05,  3.300000e-01, -1.647075e+04],
          [ 4.150510e+05,  3.500000e-01, -2.473775e+04]])

Transpose

There are many ways to solve this problem!
Things will be easier with the following:

taxes.shape

(6, 2)

taxes.reshape(2,6)

array([[9.27500e+03, 1.00000e-01, 3.76500e+04, 1.50000e-01, 9.11500e+04,
        2.50000e-01],
       [1.90150e+05, 2.80000e-01, 4.13350e+05, 3.30000e-01, 4.15051e+05,
        3.50000e-01]])

taxes.transpose()

array([[9.27500e+03, 3.76500e+04, 9.11500e+04, 1.90150e+05, 4.13350e+05,
        4.15051e+05],
       [1.00000e-01, 1.50000e-01, 2.50000e-01, 2.80000e-01, 3.30000e-01,
        3.50000e-01]])

Bonus Problem

NumPy allows random number generation.
Generate one million random numbers between, say, 0 and 500000.
- There will be repeats, which is okay.
Use Python with and without NumPy to compute every tax cost.
See which one is faster! You can use the shell command time

Aside: Random

The following:
- Generates 1 million random integers from 0 to 50000
- Prints every 10 thousandth integer.

import numpy as np
rng = np.random.default_rng()
incomes = rng.integers(0, 500000, 1000000)
print(incomes[::10000])

[164219 205640 135407 395905 409039 335473 226284 193945 168204 466685
 108853 220358 409575  13367  44930  27362 242105 278487  35833 131346
  99155  31995  53368 332440 161793 284517 408948 415420 405893 392534
 134528 373840 431580 203174 180125 324829 177619  97037 432494 305369
 496279  11634  85458 467016 159980 348472  73514 365492 426172  20126
  53096  57521 412181 125836 219181  92477  93626 425009  99766 406239
 218541 485387 187187 486400  50693 409695  71800 213456 405162 148490
 362513 118057 491664 479600  22245 348599 338973 380644 119857 234855
  58350 464280 332301 291197 159519 108239 425589 201118 477737 241862
 393534 322556 163955 316297 332545 316997 281938  77769  68896 259183]

Read more: Random sample

Aside: `time`

Use time before a command.
- real time is how much passes in real life

$ time python3 onemil.py
[123177 422613 471310 380518  95385 143328 426503 453832 427403 106416
 327306 476263  65814 281381 422404  59938  14231 232824 342190 329545
 412684 112339 202498   5071  59114 394601 451216  92268 381107 487447
  55089 339493 344836 261917 148326 452850 409130 484951 427839 307217
 259268 485208 331277 183015 132480 345930 439366   6814  39743 268276
  80739 293355 170394   4220  48082  15668 453927  58059 320294 101182
   1864 492297 130465   9920  76321 345944 268312 255875  46614 195236
 233737 443948 343483 116870 165561 326265 103567 327780 475672 392212
 396479 328248  43273  32596 246212   4258  60202  66783 135035 155327
 469638 378485 175496 428130 493185 154716 193012 424037 197666 103758]

real    0m0.155s
user    0m1.883s
sys     0m0.033s

Solution

Code

cutoffs = taxes[:,0]
rates = taxes[:,1]

# If you are confused, print(arr) after every line
arr = np.append([0], cutoffs)
arr = arr[1:] - arr[:-1]
arr *= rates
arr = np.cumsum(arr)
arr = arr - cutoffs * rates

taxes = np.array([cutoffs,rates,arr])

taxes = taxes.transpose()

Using it

I used the exact same single_tax function as in the “Shell” exercise.

Code

def single_tax(income):
    # Check all brackets
    for tax in taxes:
        if income < tax[0]:
            return income * tax[1] + tax[2]
    # We calculated the top bracket earlier
    return income * .396 + -43830.05

Why NumPy?

What is “NumPy”?

On NumPy

Relevance

We will

Package Management

PyPI

pip

Using pip

Verify Install

Aside: Dunder

Recall

Arrays

NumPy Arrays

List to Array

Lists of lists

Check Types

Metaphor

Versus arrays

Niceties

Contingencies

Takeaways:

Arrays

Tables

ndarray

Metaphor

Making Arrays

Aside: Scientific Notation

Aside: Explanation

Aside: Physical Examples

Aside: Economic Examples

Using Arrays

Inspecting Arrays

Slices

Understanding slices

Omiting Values

Omiting End

Steps

Step Size

Example

Step Omissions

Aside: Negatives

Coordinates

Coordinate slices

Example

Updates

Vectorization

Why NumPy?

Vector Ops

Example

Aside: //

Use case

Building Brackets

Begin and end

Insert

Append

Slicing

Vector Minus

Vector Times

Accumulation

Aside: Mean

Takeaways

NumPy does a lot!

On Names

Exercise

Using NumPy

Transpose

Bonus Problem

Aside: Random

Aside: time

Solution

Using it

Using `pip`

Aside: `//`

Aside: `time`