NumPy
Scientific Computing
Why NumPy?
What is “NumPy”?
The fundamental package for scientific computing with Python.
- Numerical Python
- It is a package - addition features we can optional add to the Python language.
On NumPy
Basically, to do scientific computing it would be nice to have:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities
(These are the stated features of NumPy.)
Relevance
- We have essentially been using 1- and 2-dimensional arrays already!
- Here is the examples
intercepts.py
solution from last time.- It will be cut off but you can copy/paste if you need it.
We will
- Change our existing lists, which we understand as “N-dimensional arrays” into NumPy arrays.
- Show the benefits of this arrangement.
- Show other NumPy features.
Package Management
PyPI
The Python Package Index (PyPI) is a repository of software for the Python programming language.
- NumPy is a Python package.
- It is installed separately from Python, but
- May be installed using Python-based tools.
pip
The most popular tool for installing Python packages, and the one included with modern versions of Python.
pip
is a command line utility- For years
pip
was the only real options for installing Python programs, but has experienced limitations as the package ecosystem has grown quite large.
Using pip
- Use as an argument to
python
orpython3
python3 -m pip install numpy
- It may take a moment to install.
Verify Install
- The following verifies the the NumPy install was successful
- We see the return of
import
- Introduce
as
, used to shorten names - We could have done
import pw as piecewise
- Introduce
Aside: Dunder
- Around “version” there are two underscores, called “dunder” for double underscore.
- These “dunder” values are special built-in Python values with a specific meaning.
- You can see another by just printing
np
Recall
- You may wish to review
import
- Consult Neovim -> `import’
Arrays
NumPy Arrays
- The first thing to do in NumPy is make an array.
- In general we use NumPy arrays when:
- We are dealing with lists of numbers
- We care about performance, or
- We want to use advanced mathematical operations
List to Array
- We define a list of lists:
Lists of lists
- It is worth while to examine
taxes
a bit
Check Types
- We can verify it is a list.
- We can verify it’s initial element is also a list.
- Since
taxes[0]
is a list, we can look at that list’s initial element.
Metaphor
- Perhaps we regard 0-indexed element of an array as the house on the corner of a block.
- Perhaps we regard the 0-indexed element of that “house” as the ground floor.
Versus arrays
- We have a list of lists of ints (like integers, round numbers).
- Or do we?
- In fact, the tax rates are not round numbers, so they are “floats”.
- NumPy will help us manage when we use “floating point numbers” (have a decimal point) and integers (don’t).
Niceties
- Each of the internal lists of numbers is of the same length.
- That lets us do this:
Cutoff | Rate |
---|---|
9275 | .10 |
37650 | .15 |
91150 | .25 |
190150 | .28 |
413350 | .33 |
415051 | .35 |
Contingencies
- This… isn’t always true.
- In our case, for example, we have a .396 rate with no cutoff.
- We just can’t express this as an array:
Cutoff | Rate |
---|---|
413350 | .33 |
415051 | .35 |
.396 |
Takeaways:
- Be ready to deal with things being almost arrays, but ultimately only being lists-of-lists.
- There’s ways to deal with this (we’ve seen a few, sneakily)
Arrays
In computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data. For instance, if each element of the data were a number, we might visualize a “one-dimensional” array like a list:
\[ \begin{array}{|c||c|c|c|} \hline 9275 & 37650 & 91150 & 190150 \\ \hline \end{array} \]
Tables
A two-dimensional array would be like a table:
\[ \begin{array}{|c||c|c|c|} \hline 9275 & 37650 & 91150 & 190150 \\ \hline .10 & .15 & .25 & .28 \\ \hline 0 & -463.75 & -6963.25 & -16470.75 \\ \hline \end{array} \]
ndarray
A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages. In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called
ndarray
: it represents an “N-dimensional array”.
- The most obvious 3D example would be that
taxes
part of an array of tax policies
Metaphor
- Perhaps we have blocks in a city.
- Houses on a block.
- Stories on a house or floors in apartment.
Making Arrays
- Make an array with
np.array()
- Or
numpy.array
if you usedimport numpy
- Or
- They look like this:
array([[9.27500e+03, 1.00000e-01],
[3.76500e+04, 1.50000e-01],
[9.11500e+04, 2.50000e-01],
[1.90150e+05, 2.80000e-01],
[4.13350e+05, 3.30000e-01],
[4.15051e+05, 3.50000e-01]])
- Our integers are gone - everything in scientific notation
Aside: Scientific Notation
- In scientific notation, nonzero numbers are written in the form
\[a \times 10^b\]
Aside: Explanation
- In scientific notation, nonzero numbers are written in the form
\[a \times 10^b\]
- \(a\) (the coefficient or mantissa) is a number greater than or equal to 1 and less than 10 (\(1 \le |a| < 10\)).
- \(10\) is the base.
- \(b\) (the exponent) is an integer.
Aside: Physical Examples
Speed of light: The speed of light in a vacuum is approximately \(300,000,000 \text{ m/s}\) \[ 3 \times 10^8 \text{ m/s} \]
Mass of an electron: The mass of an electron is approximately \(0.00000000000000000000000000091093837 \text{ g}\). \[ 9.1093837 \times 10^{-28} \text{ g} \]
Aside: Economic Examples
- We can use social science numbers.
- Labor Market Outcomes of College Graduates by Major
- Computer Science majors in 2025 have a $80,000 median wage
- \(8.0000 \times 10^4\)
- And 6.1% unemployment
- \(6.1 \times 10^{-2}\)
Using Arrays
Inspecting Arrays
- Given some array, we can look up elements in an array as we did with lists.
- We refer to zero as the “index” of the initial element of an array (or list).
- We look up the same element by the same index in both Python lists and NumPy arrays.
Slices
- Python and NumPy support slicing
- This takes multiple elements of an array by specifying a range of indices
- Let’s make a one-dimensional array to make matters simpler.
dtype
is data type - We’ll cover it soon.
Understanding slices
- The slice
1:4
takes all elements at index beginning at1
and stopping before getting to index4
.
\[ \small \begin{array}{|c|c|c|c|c|c|} \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline red & orange& yellow& green& blue& indigo& violet\\ \hline \end{array} \]
\[ \small \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline orange& yellow& green\\ \hline \end{array} \]
Omiting Values
- If we omit the value before the
:
from the slice, it is treated as if a zero was provided.
Omiting End
- If we omit the value after the
:
from the slice, it is as if the length was provided.
len
gives the length of a list or array
Steps
- Python and NumPy slices have an additional feature I find quite nice called “steps”
- We specify a slice:
- We add a third value
Step Size
- We add a third value
- This value determines how far to move over the original array between each element shown.
- Let’s look at
2
- every other element.
- Let’s look at
Example
- In kindergarten, I learned that red, yellow, and blue were primary colors.
- Less sure now but a good ex.
- And the primaries:
Step Omissions
- We can omit start, stop, or both and still take steps.
- We can use a negative step to reverse.
Aside: Negatives
- Negative starts and stops can also be used
- They simply measure distance from the end.
Coordinates
- Versus Python lists-of-lists.
- NumPy arrays can specify coordinates - indices of multiple dimensions - in a single set of brackets.
Coordinate slices
- I recommend using the comma notation.
- Otherwise I get unexpected behavior.
- Takeaway: Always use
[x,y]
instead of[x][y]
- Things like this are we use NumPy!
Example
- I want all the tax rates
- Metaphor: 1st floor of every apartment on the block.
Updates
- We can also update entries via
=
- Use (1) name of array, (2) index of element, (3)
=
, (4) new element
- The array now has that element/index.
Vectorization
Why NumPy?
- NumPy vectors can do a very cool thing that lists can’t.
- We use it a lot in science.
- Vector operations.
Vector Ops
- Let’s compare Python
- To NumPy
- By the way, vectorization is really fast.
- Our first “high performance computing” idea.
Example
- Suppose you need to convert some temperatures.
Aside: //
- NumPy requires arrays to be of a certain kind of number.
30
is an integer.30/9
is a decimal value…- Python furnishes special
//
integer division.- It truncates (does not round) the result.
- Try using
/
there. What happens?
Use case
- Slices and vectorization help with income tax.
- We:
- Had income cutoffs, that were
- The beginning of some tax brackets, but
- The end of other tax brackets
- Offset by one (
1
)
Building Brackets
- Let’s refresh on what the tax bracket array looked like.
array([[9.27500e+03, 1.00000e-01],
[3.76500e+04, 1.50000e-01],
[9.11500e+04, 2.50000e-01],
[1.90150e+05, 2.80000e-01],
[4.13350e+05, 3.30000e-01],
[4.15051e+05, 3.50000e-01]])
- Initial tax bracket goes zero and to
9275
- The next goes from
9275
to37650
. - So
9275
is useful to two brackets.
Begin and end
- Let’s grab just the cutoffs.
[:, 0]
means for every row (slice:
) take the initial columns (index0
)- Take the ground floor of every house on the block.
Insert
- The initial bracket begins at
0
and the last bracket ends at infinity.- NumPy knows about infinity!
np.inf
- NumPy knows about infinity!
- We can use NumPy
insert
to add an element at an index:
Append
- I usually don’t insert, I append.
- This allows adding two arrays together, like how Python
+
works on lists.- Remember, NumPy can treat a value like
0
as a 0-D (zero dimensional) array. [0]
would also work.
- Remember, NumPy can treat a value like
Slicing
- Now we can take the beginning of every bracket, by index:
- And the end
Vector Minus
- We can see how big each bracket is.
Vector Times
- We can see how much tax is spent in each bracket.
- We will “cut off” the last bracket first
- It doesn’t really have a size?
array([ 927.5 , 4256.25, 13375. , 27720. , 73656. , 595.35])
- What happens if we don’t cut off the last bracket and try to multiply a vector of length
7
by a vector of length6
?
Accumulation
- While
bracket_cost
does correctly describe the cost within on bracket, someone in then+1
’th bracket pays the cost the previousn
brackets. - Someone how we want to sum those up.
- NumPy has many built-in array functions, including
np.cumsum
.
Aside: Mean
- Other than
np.cumsum
, there are other functions over arrays we often use. - It isn’t particularly useful here, but
np.mean
is quite common:
- Takeaway: Some functions convert arrays back to values.
Takeaways
NumPy does a lot!
- Arrays mostly, but including:
- Vector operations called “broadcasting”.
- Indices and slices matter a lot.
- Contants like
np.inf
support mathematics - Accumulation functions we don’t have to write ourselves (which we’d have to do via loops)
On Names
You might hear of a 0-D (zero-dimensional) array referred to as a “scalar”, a 1-D (one-dimensional) array as a “vector”, a 2-D (two-dimensional) array as a “matrix”, or an N-D (N-dimensional, where “N” is typically an integer greater than 2) array as a “tensor”.
For clarity, it is best to avoid the mathematical terms when referring to an array because the mathematical objects with these names behave differently than arrays (e.g. “matrix” multiplication is fundamentally different from “array” multiplication), and there are other objects in the scientific Python ecosystem that have these names (e.g. the fundamental data structure of PyTorch is the “tensor”).
Exercise
Using NumPy
- Today we successfully recomputed the points of the point-intercept form of the income tax problem.
- Using NumPy, starting with
taxes
, create the array on the right:
Transpose
- There are many ways to solve this problem!
- Things will be easier with the following:
array([[9.27500e+03, 1.00000e-01, 3.76500e+04, 1.50000e-01, 9.11500e+04,
2.50000e-01],
[1.90150e+05, 2.80000e-01, 4.13350e+05, 3.30000e-01, 4.15051e+05,
3.50000e-01]])
Bonus Problem
- NumPy allows random number generation.
- Generate one million random numbers between, say,
0
and500000
.- There will be repeats, which is okay.
- Use Python with and without NumPy to compute every tax cost.
- See which one is faster! You can use the shell command
time
Aside: Random
- The following:
- Generates 1 million random integers from
0
to50000
- Prints every 10 thousandth integer.
- Generates 1 million random integers from
import numpy as np
rng = np.random.default_rng()
incomes = rng.integers(0, 500000, 1000000)
print(incomes[::10000])
[164219 205640 135407 395905 409039 335473 226284 193945 168204 466685
108853 220358 409575 13367 44930 27362 242105 278487 35833 131346
99155 31995 53368 332440 161793 284517 408948 415420 405893 392534
134528 373840 431580 203174 180125 324829 177619 97037 432494 305369
496279 11634 85458 467016 159980 348472 73514 365492 426172 20126
53096 57521 412181 125836 219181 92477 93626 425009 99766 406239
218541 485387 187187 486400 50693 409695 71800 213456 405162 148490
362513 118057 491664 479600 22245 348599 338973 380644 119857 234855
58350 464280 332301 291197 159519 108239 425589 201118 477737 241862
393534 322556 163955 316297 332545 316997 281938 77769 68896 259183]
- Read more: Random sample
Aside: time
- Use
time
before a command.real
time is how much passes in real life
$ time python3 onemil.py
[123177 422613 471310 380518 95385 143328 426503 453832 427403 106416
327306 476263 65814 281381 422404 59938 14231 232824 342190 329545
412684 112339 202498 5071 59114 394601 451216 92268 381107 487447
55089 339493 344836 261917 148326 452850 409130 484951 427839 307217
259268 485208 331277 183015 132480 345930 439366 6814 39743 268276
80739 293355 170394 4220 48082 15668 453927 58059 320294 101182
1864 492297 130465 9920 76321 345944 268312 255875 46614 195236
233737 443948 343483 116870 165561 326265 103567 327780 475672 392212
396479 328248 43273 32596 246212 4258 60202 66783 135035 155327
469638 378485 175496 428130 493185 154716 193012 424037 197666 103758]
real 0m0.155s
user 0m1.883s
sys 0m0.033s
Solution
Using it
- I used the exact same
single_tax
function as in the “Shell” exercise.