NumPy

Scientific Computing

Author

Prof. Calvin

Why NumPy?

What is “NumPy”?

The fundamental package for scientific computing with Python.

  • Numerical Python
  • It is a package - addition features we can optional add to the Python language.

On NumPy

Basically, to do scientific computing it would be nice to have:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

(These are the stated features of NumPy.)

Relevance

  • We have essentially been using 1- and 2-dimensional arrays already!
  • Here is the examples intercepts.py solution from last time.
    • It will be cut off but you can copy/paste if you need it.
Code
taxes = [
    [9275, .10],
    [37650, .15],
    [91150, .25],
    [190150, .28],
    [413350, .33],
    [415051, .35]
]

cost = 0
start = 0
for tax in taxes:
    cost += (tax[0] - start) * tax[1]
    tax += [cost - tax[0] * tax[1]]
    start = tax[0]

We will

  • Change our existing lists, which we understand as “N-dimensional arrays” into NumPy arrays.
  • Show the benefits of this arrangement.
  • Show other NumPy features.

Package Management

PyPI

The Python Package Index (PyPI) is a repository of software for the Python programming language.

  • NumPy is a Python package.
    • It is installed separately from Python, but
    • May be installed using Python-based tools.

pip

The most popular tool for installing Python packages, and the one included with modern versions of Python.

  • pip is a command line utility
  • For years pip was the only real options for installing Python programs, but has experienced limitations as the package ecosystem has grown quite large.

Using pip

  • Use as an argument to python or python3
python3 -m pip install numpy
  • It may take a moment to install.

Verify Install

  • The following verifies the the NumPy install was successful
  • We see the return of import
    • Introduce as, used to shorten names
    • We could have done import pw as piecewise
import numpy as np

print(np.__version__)
2.0.1

Aside: Dunder

  • Around “version” there are two underscores, called “dunder” for double underscore.
  • These “dunder” values are special built-in Python values with a specific meaning.
  • You can see another by just printing np
print(np)
<module 'numpy' from 'C:\\Users\\cd-desk\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\numpy\\__init__.py'>

Recall

Arrays

NumPy Arrays

  • The first thing to do in NumPy is make an array.
  • In general we use NumPy arrays when:
    • We are dealing with lists of numbers
    • We care about performance, or
    • We want to use advanced mathematical operations

List to Array

  • We define a list of lists:
taxes = [
    [9275, .10],
    [37650, .15],
    [91150, .25],
    [190150, .28],
    [413350, .33],
    [415051, .35]
]

Lists of lists

  • It is worth while to examine taxes a bit
taxes
[[9275, 0.1],
 [37650, 0.15],
 [91150, 0.25],
 [190150, 0.28],
 [413350, 0.33],
 [415051, 0.35]]

Check Types

  • We can verify it is a list.
type(taxes)
list
  • We can verify it’s initial element is also a list.
type(taxes[0])
list
  • Since taxes[0] is a list, we can look at that list’s initial element.
type(taxes[0][0])
int

Metaphor

  • Perhaps we regard 0-indexed element of an array as the house on the corner of a block.
  • Perhaps we regard the 0-indexed element of that “house” as the ground floor.
taxes[0][0] # House on the corner, ground floor
9275

Versus arrays

  • We have a list of lists of ints (like integers, round numbers).
  • Or do we?
type(taxes[0][1])
float
  • In fact, the tax rates are not round numbers, so they are “floats”.
  • NumPy will help us manage when we use “floating point numbers” (have a decimal point) and integers (don’t).

Niceties

  • Each of the internal lists of numbers is of the same length.
  • That lets us do this:
Cutoff Rate
9275 .10
37650 .15
91150 .25
190150 .28
413350 .33
415051 .35

Contingencies

  • This… isn’t always true.
  • In our case, for example, we have a .396 rate with no cutoff.
  • We just can’t express this as an array:
Cutoff Rate
413350 .33
415051 .35
.396

Takeaways:

  • Be ready to deal with things being almost arrays, but ultimately only being lists-of-lists.
  • There’s ways to deal with this (we’ve seen a few, sneakily)

Arrays

In computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data. For instance, if each element of the data were a number, we might visualize a “one-dimensional” array like a list:

\[ \begin{array}{|c||c|c|c|} \hline 9275 & 37650 & 91150 & 190150 \\ \hline \end{array} \]

Tables

A two-dimensional array would be like a table:

\[ \begin{array}{|c||c|c|c|} \hline 9275 & 37650 & 91150 & 190150 \\ \hline .10 & .15 & .25 & .28 \\ \hline 0 & -463.75 & -6963.25 & -16470.75 \\ \hline \end{array} \]

ndarray

A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages. In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called ndarray: it represents an “N-dimensional array”.

  • The most obvious 3D example would be that taxes part of an array of tax policies
["Single", "Married, joint", "Married, separate", "Head of Household"]

Metaphor

  • Perhaps we have blocks in a city.
  • Houses on a block.
  • Stories on a house or floors in apartment.

Making Arrays

  • Make an array with np.array()
    • Or numpy.array if you used import numpy
  • They look like this:
arr = np.array(taxes)
arr
array([[9.27500e+03, 1.00000e-01],
       [3.76500e+04, 1.50000e-01],
       [9.11500e+04, 2.50000e-01],
       [1.90150e+05, 2.80000e-01],
       [4.13350e+05, 3.30000e-01],
       [4.15051e+05, 3.50000e-01]])
  • Our integers are gone - everything in scientific notation

Aside: Scientific Notation

Scientific notation is a way of expressing numbers that are too large or too small to be conveniently written in decimal form, since to do so would require writing out an inconveniently long string of digits.

  • In scientific notation, nonzero numbers are written in the form

\[a \times 10^b\]

Aside: Explanation

  • In scientific notation, nonzero numbers are written in the form

\[a \times 10^b\]

  • \(a\) (the coefficient or mantissa) is a number greater than or equal to 1 and less than 10 (\(1 \le |a| < 10\)).
  • \(10\) is the base.
  • \(b\) (the exponent) is an integer.

Aside: Physical Examples

  1. Speed of light: The speed of light in a vacuum is approximately \(300,000,000 \text{ m/s}\) \[ 3 \times 10^8 \text{ m/s} \]

  2. Mass of an electron: The mass of an electron is approximately \(0.00000000000000000000000000091093837 \text{ g}\). \[ 9.1093837 \times 10^{-28} \text{ g} \]

Aside: Economic Examples

Using Arrays

Inspecting Arrays

  • Given some array, we can look up elements in an array as we did with lists.
  • We refer to zero as the “index” of the initial element of an array (or list).
taxes[0]
[9275, 0.1]
arr[0]
array([9.275e+03, 1.000e-01])
  • We look up the same element by the same index in both Python lists and NumPy arrays.

Slices

  • Python and NumPy support slicing
  • This takes multiple elements of an array by specifying a range of indices
  • Let’s make a one-dimensional array to make matters simpler.
colors = ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
color_lst = colors
color_lst[1:4]
['orange', 'yellow', 'green']
color_arr = np.array(colors)
color_arr[1:4]
array(['orange', 'yellow', 'green'], dtype='<U6')
  • dtype is data type - We’ll cover it soon.

Understanding slices

  • The slice 1:4 takes all elements at index beginning at 1 and stopping before getting to index 4.

\[ \small \begin{array}{|c|c|c|c|c|c|} \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline red & orange& yellow& green& blue& indigo& violet\\ \hline \end{array} \]

\[ \small \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline orange& yellow& green\\ \hline \end{array} \]

Omiting Values

  • If we omit the value before the : from the slice, it is treated as if a zero was provided.
color_arr
array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')
color_arr[0:4]
array(['red', 'orange', 'yellow', 'green'], dtype='<U6')
color_arr[:4]
array(['red', 'orange', 'yellow', 'green'], dtype='<U6')

Omiting End

  • If we omit the value after the : from the slice, it is as if the length was provided.
color_arr
array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')
  • len gives the length of a list or array
len(color_arr)
7
color_arr[4:7]
array(['blue', 'indigo', 'violet'], dtype='<U6')
color_arr[4:]
array(['blue', 'indigo', 'violet'], dtype='<U6')

Steps

  • Python and NumPy slices have an additional feature I find quite nice called “steps”
  • We specify a slice:
color_arr[1:4]
array(['orange', 'yellow', 'green'], dtype='<U6')
  • We add a third value
color_arr[1:4:1]
array(['orange', 'yellow', 'green'], dtype='<U6')

Step Size

  • We add a third value
color_arr[1:4:1]
array(['orange', 'yellow', 'green'], dtype='<U6')
  • This value determines how far to move over the original array between each element shown.
    • Let’s look at 2 - every other element.
color_arr[1:4:2]
array(['orange', 'green'], dtype='<U6')

Example

  • In kindergarten, I learned that red, yellow, and blue were primary colors.
    • Less sure now but a good ex.
color_arr
array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')
  • And the primaries:
color_arr[0:5:2]
array(['red', 'yellow', 'blue'], dtype='<U6')

Step Omissions

  • We can omit start, stop, or both and still take steps.
color_arr[::3]
array(['red', 'green', 'violet'], dtype='<U6')
  • We can use a negative step to reverse.
color_arr[::-1]
array(['violet', 'indigo', 'blue', 'green', 'yellow', 'orange', 'red'],
      dtype='<U6')

Aside: Negatives

  • Negative starts and stops can also be used
color_arr
array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')
  • They simply measure distance from the end.
color_arr[:-2]
array(['red', 'orange', 'yellow', 'green', 'blue'], dtype='<U6')

Coordinates

  • Versus Python lists-of-lists.
taxes[1][1]
0.15
  • NumPy arrays can specify coordinates - indices of multiple dimensions - in a single set of brackets.
arr[1][1], arr[1, 1]
(np.float64(0.15), np.float64(0.15))

Coordinate slices

  • I recommend using the comma notation.
  • Otherwise I get unexpected behavior.
# reverse in both dimensions
arr[::-1][::-1] # Just reverses twice 
array([[9.27500e+03, 1.00000e-01],
       [3.76500e+04, 1.50000e-01],
       [9.11500e+04, 2.50000e-01],
       [1.90150e+05, 2.80000e-01],
       [4.13350e+05, 3.30000e-01],
       [4.15051e+05, 3.50000e-01]])
# reverse in both dimensions
arr[::-1,::-1] # It works!
array([[3.50000e-01, 4.15051e+05],
       [3.30000e-01, 4.13350e+05],
       [2.80000e-01, 1.90150e+05],
       [2.50000e-01, 9.11500e+04],
       [1.50000e-01, 3.76500e+04],
       [1.00000e-01, 9.27500e+03]])
  • Takeaway: Always use [x,y] instead of [x][y]
  • Things like this are we use NumPy!

Example

  • I want all the tax rates
    • Metaphor: 1st floor of every apartment on the block.
# Look at the whole block, take address 1
arr[:][1] # Gives house address one
array([3.765e+04, 1.500e-01])
# Look at the whole block, take address 1
arr[:, 1] # Gives all first floors on the block
array([0.1 , 0.15, 0.25, 0.28, 0.33, 0.35])

Updates

  • We can also update entries via =
color_arr
array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
      dtype='<U6')
  • Use (1) name of array, (2) index of element, (3) =, (4) new element
color_arr[-1] = 'purple'
  • The array now has that element/index.
color_arr
array(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'purple'],
      dtype='<U6')

Vectorization

Why NumPy?

  • NumPy vectors can do a very cool thing that lists can’t.
  • We use it a lot in science.
  • Vector operations.

Vector Ops

  • Let’s compare Python
l1 = [1,2,3]
l2 = [1,1,1]
l1 + l2
[1, 2, 3, 1, 1, 1]
  • To NumPy
a1 = np.array(l1)
a2 = np.array(l2)
a1 + a2
array([2, 3, 4])
  • By the way, vectorization is really fast.
  • Our first “high performance computing” idea.

Example

  • Suppose you need to convert some temperatures.
# Portland highs and lows 5/26/25-6/2/25 in degF
temps = np.array([70, 50, 81, 60, 85, 57, 70, 52, 84, 58, 87, 58, 74, 53, 70, 54])
temps -= 32
temps *= 5
temps //= 9
temps
array([21, 10, 27, 15, 29, 13, 21, 11, 28, 14, 30, 14, 23, 11, 21, 12])

Aside: //

  • NumPy requires arrays to be of a certain kind of number.
  • 30 is an integer.
  • 30/9 is a decimal value…
  • Python furnishes special // integer division.
    • It truncates (does not round) the result.
np.array([1,2,3,4,5,6,7,8,9,10]) // 3
array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3])
  • Try using / there. What happens?

Use case

  • Slices and vectorization help with income tax.
  • We:
    • Had income cutoffs, that were
    • The beginning of some tax brackets, but
    • The end of other tax brackets
    • Offset by one (1)

Building Brackets

  • Let’s refresh on what the tax bracket array looked like.
arr
array([[9.27500e+03, 1.00000e-01],
       [3.76500e+04, 1.50000e-01],
       [9.11500e+04, 2.50000e-01],
       [1.90150e+05, 2.80000e-01],
       [4.13350e+05, 3.30000e-01],
       [4.15051e+05, 3.50000e-01]])
  • Initial tax bracket goes zero and to 9275
  • The next goes from 9275 to 37650.
  • So 9275 is useful to two brackets.

Begin and end

  • Let’s grab just the cutoffs.
cutoffs = arr[:,0]
cutoffs
array([  9275.,  37650.,  91150., 190150., 413350., 415051.])
  • [:, 0] means for every row (slice :) take the initial columns (index 0)
    • Take the ground floor of every house on the block.

Insert

  • The initial bracket begins at 0 and the last bracket ends at infinity.
    • NumPy knows about infinity! np.inf
  • We can use NumPy insert to add an element at an index:
# Insert to (a) an array at (b) some index (c) some value
a = cutoffs
b = len(cutoffs)
c = np.inf
cutoffs = np.insert(a, b, c)
cutoffs
array([  9275.,  37650.,  91150., 190150., 413350., 415051.,     inf])

Append

  • I usually don’t insert, I append.
  • This allows adding two arrays together, like how Python + works on lists.
    • Remember, NumPy can treat a value like 0 as a 0-D (zero dimensional) array.
    • [0] would also work.
# Smoosh arrays, including scalars, together.
cutoffs = np.append(0,cutoffs)
cutoffs
array([     0.,   9275.,  37650.,  91150., 190150., 413350., 415051.,
           inf])

Slicing

  • Now we can take the beginning of every bracket, by index:
begin = cutoffs[:-1]
begin
array([     0.,   9275.,  37650.,  91150., 190150., 413350., 415051.])
  • And the end
end = cutoffs[1:]
end
array([  9275.,  37650.,  91150., 190150., 413350., 415051.,     inf])

Vector Minus

  • We can see how big each bracket is.
bracket_size = end - begin
bracket_size
array([  9275.,  28375.,  53500.,  99000., 223200.,   1701.,     inf])

Vector Times

  • We can see how much tax is spent in each bracket.
    • We will “cut off” the last bracket first
    • It doesn’t really have a size?
bracket_cost = bracket_size[:-1] * arr[:, 1]
bracket_cost
array([  927.5 ,  4256.25, 13375.  , 27720.  , 73656.  ,   595.35])
  • What happens if we don’t cut off the last bracket and try to multiply a vector of length 7 by a vector of length 6?

Accumulation

  • While bracket_cost does correctly describe the cost within on bracket, someone in the n+1’th bracket pays the cost the previous n brackets.
  • Someone how we want to sum those up.
  • NumPy has many built-in array functions, including np.cumsum.
total_cost = np.cumsum(bracket_cost)
total_cost
array([   927.5 ,   5183.75,  18558.75,  46278.75, 119934.75, 120530.1 ])

Aside: Mean

  • Other than np.cumsum, there are other functions over arrays we often use.
  • It isn’t particularly useful here, but np.mean is quite common:
np.mean(total_cost)
np.float64(51902.26666666666)
  • Takeaway: Some functions convert arrays back to values.

Takeaways

NumPy does a lot!

On Names

You might hear of a 0-D (zero-dimensional) array referred to as a “scalar”, a 1-D (one-dimensional) array as a “vector”, a 2-D (two-dimensional) array as a “matrix”, or an N-D (N-dimensional, where “N” is typically an integer greater than 2) array as a “tensor”.

For clarity, it is best to avoid the mathematical terms when referring to an array because the mathematical objects with these names behave differently than arrays (e.g. “matrix” multiplication is fundamentally different from “array” multiplication), and there are other objects in the scientific Python ecosystem that have these names (e.g. the fundamental data structure of PyTorch is the “tensor”).

Exercise

Using NumPy

  • Today we successfully recomputed the points of the point-intercept form of the income tax problem.
  • Using NumPy, starting with taxes, create the array on the right:
taxes = np.array([
    [9275, .1],
    [37650, .15],
    [91150, .25],
    [190150, .28],
    [413350, .33],
    [415051, .35]
])
np.array([[ 9.275000e+03,  1.000000e-01,  0.000000e+00],
          [ 3.765000e+04,  1.500000e-01, -4.637500e+02],
          [ 9.115000e+04,  2.500000e-01, -4.228750e+03],
          [ 1.901500e+05,  2.800000e-01, -6.963250e+03],
          [ 4.133500e+05,  3.300000e-01, -1.647075e+04],
          [ 4.150510e+05,  3.500000e-01, -2.473775e+04]])

Transpose

  • There are many ways to solve this problem!
  • Things will be easier with the following:
taxes.shape
(6, 2)
taxes.reshape(2,6)
array([[9.27500e+03, 1.00000e-01, 3.76500e+04, 1.50000e-01, 9.11500e+04,
        2.50000e-01],
       [1.90150e+05, 2.80000e-01, 4.13350e+05, 3.30000e-01, 4.15051e+05,
        3.50000e-01]])
taxes.transpose()
array([[9.27500e+03, 3.76500e+04, 9.11500e+04, 1.90150e+05, 4.13350e+05,
        4.15051e+05],
       [1.00000e-01, 1.50000e-01, 2.50000e-01, 2.80000e-01, 3.30000e-01,
        3.50000e-01]])

Bonus Problem

  • NumPy allows random number generation.
  • Generate one million random numbers between, say, 0 and 500000.
    • There will be repeats, which is okay.
  • Use Python with and without NumPy to compute every tax cost.
  • See which one is faster! You can use the shell command time

Aside: Random

  • The following:
    • Generates 1 million random integers from 0 to 50000
    • Prints every 10 thousandth integer.
import numpy as np
rng = np.random.default_rng()
incomes = rng.integers(0, 500000, 1000000)
print(incomes[::10000])
[164219 205640 135407 395905 409039 335473 226284 193945 168204 466685
 108853 220358 409575  13367  44930  27362 242105 278487  35833 131346
  99155  31995  53368 332440 161793 284517 408948 415420 405893 392534
 134528 373840 431580 203174 180125 324829 177619  97037 432494 305369
 496279  11634  85458 467016 159980 348472  73514 365492 426172  20126
  53096  57521 412181 125836 219181  92477  93626 425009  99766 406239
 218541 485387 187187 486400  50693 409695  71800 213456 405162 148490
 362513 118057 491664 479600  22245 348599 338973 380644 119857 234855
  58350 464280 332301 291197 159519 108239 425589 201118 477737 241862
 393534 322556 163955 316297 332545 316997 281938  77769  68896 259183]

Aside: time

  • Use time before a command.
    • real time is how much passes in real life
$ time python3 onemil.py
[123177 422613 471310 380518  95385 143328 426503 453832 427403 106416
 327306 476263  65814 281381 422404  59938  14231 232824 342190 329545
 412684 112339 202498   5071  59114 394601 451216  92268 381107 487447
  55089 339493 344836 261917 148326 452850 409130 484951 427839 307217
 259268 485208 331277 183015 132480 345930 439366   6814  39743 268276
  80739 293355 170394   4220  48082  15668 453927  58059 320294 101182
   1864 492297 130465   9920  76321 345944 268312 255875  46614 195236
 233737 443948 343483 116870 165561 326265 103567 327780 475672 392212
 396479 328248  43273  32596 246212   4258  60202  66783 135035 155327
 469638 378485 175496 428130 493185 154716 193012 424037 197666 103758]

real    0m0.155s
user    0m1.883s
sys     0m0.033s

Solution

Code
cutoffs = taxes[:,0]
rates = taxes[:,1]

# If you are confused, print(arr) after every line
arr = np.append([0], cutoffs)
arr = arr[1:] - arr[:-1]
arr *= rates
arr = np.cumsum(arr)
arr = arr - cutoffs * rates

taxes = np.array([cutoffs,rates,arr])

taxes = taxes.transpose()

Using it

  • I used the exact same single_tax function as in the “Shell” exercise.
Code
def single_tax(income):
    # Check all brackets
    for tax in taxes:
        if income < tax[0]:
            return income * tax[1] + tax[2]
    # We calculated the top bracket earlier
    return income * .396 + -43830.05