Learning
AI 101
Setting the Stage
- We’ve explored the perceptron
- We’ve used binary classification to different odds and evens.
- We approach multi-class classification.
- And finally… we placed the entire framework within a matrix.
Setup
Recall
Representing Reality
- How does a computer “see” a simple object?
- Let’s take a standard six-sided die.
- We can represent the face of a die as a 3x3 grid.
- Or, alternatively, a 1x9 (or 9x1) “vector”
NumPy
- We used NumPy to perform element-wise multiplication over vectors.
- This is also called the “Hadamard product” or “element-wise product”.
- We step through the example from the lab, briefly.
Is One
- Last week, we found it was easy enough to classify one.
- Briefly, we looked for a center dot with a positive weight, and gave everything else a negative weight.
- Then we multiplied “element-wise” the visual data (a vector of length 9) by the weights (a vector of length 9).
\[ \sum_{i=1}^{n} w_i \times x_i \]
Is One Vector
- We used, perhaps, this “is one?” vector.
- Recall, we cannot include spaces in our names of things in Colab.
- We use “single equals assignment” to assign a variable to some name.
- In this case, the variable is the “is one?” vector.
Compare
- Find out which die we have in a single line using vector arithmetic.
Matrices
- Imagine that instead of having six dice that we want to check to see if they are one.
- We instead have one die and we wish to see what value it represents.
- We represent the die as a 1 by 9 (or 9 by 1) vector
- We represent its value (1 to 6) as a 1 by 6 (or 6 by 1) vector.
- We can perform a single matrix multiplication using a 9 by 6 to get the result.
For example
- We wish to find out what “class” (what number)
fivbelongs to. - We would start with
fiv
- We would want to get back something with a
1in the fiveth position, and zeroes elsewhere.
That matrix
- This matrix is the precise matrix made in the lab.
multi = np.array([
[-1/1, -1/1, -1/1, -1/1, 1/1, -1/1, -1/1, -1/1, -1/1],
[-1/2, -1/2, 1/2, -1/2, -1/2, -1/2, 1/2, -1/2, -1/2],
[-1/3, -1/3, 1/3, -1/3, 1/3, -1/3, 1/3, -1/3, -1/3],
[ 1/4, -1/4, 1/4, -1/4, -1/4, -1/4, 1/4, -1/4, 1/4],
[ 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5],
[ 0/6, 3/6, 0/6, 3/6, -6/6, 3/6, 0/6, 3/6, 0/6],
])Compare to 1
- Once again, we can simply by just seeing if there’s enough “weight” to make a neuron fire, or not.
- Only the neuron representing “five” will fire!
Step Back
Collect Our Thoughts
- Is this intelligent:
[[-1. -1. -1. -1. 1. -1.
-1. -1. -1. ]
[-0.5 -0.5 0.5 -0.5 -0.5 -0.5
0.5 -0.5 -0.5 ]
[-0.33333333 -0.33333333 0.33333333 -0.33333333 0.33333333 -0.33333333
0.33333333 -0.33333333 -0.33333333]
[ 0.25 -0.25 0.25 -0.25 -0.25 -0.25
0.25 -0.25 0.25 ]
[ 0.2 -0.2 0.2 -0.2 0.2 -0.2
0.2 -0.2 0.2 ]
[ 0. 0.5 0. 0.5 -1. 0.5
0. 0.5 0. ]]
Our Task
- We were trying to do a computer vision task.
- Specifically, we wished to classify dice.
Encoding
- To do so, we recognized we could view dice with as few as nice (
9) “sensory neurons.
| 1 | 2 | 3 |
| 4 | 5 | 6 |
| 7 | 8 | 9 |
- Each of these is either
0or1or perhapsTrueorFalseetc.
Vectors
- We then recognized we could place these in any order, and in a single “dimension”.
- We termed this a vector.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Graphs
- We then discussed that these vectors could be treated as part of a graph.
- The top, input layer representing the visual data.
- The bottom, output layer representing the classification
Edges
- Edges in this graph could connect sensed dots to numeric meaning.
- Middle square to odd numbers.
Weights
- Some edges may matter more than others.
- Center is very important for
1 - Kinda important for
3and5- where corners matter too.
- Center is very important for
Matrices
- Then, we note that we can express these weights in a matrix.
multi = np.array([
[-1/1, -1/1, -1/1, -1/1, 1/1, -1/1, -1/1, -1/1, -1/1],
[-1/2, -1/2, 1/2, -1/2, -1/2, -1/2, 1/2, -1/2, -1/2],
[-1/3, -1/3, 1/3, -1/3, 1/3, -1/3, 1/3, -1/3, -1/3],
[ 1/4, -1/4, 1/4, -1/4, -1/4, -1/4, 1/4, -1/4, 1/4],
[ 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5],
[ 0/6, 3/6, 0/6, 3/6, -6/6, 3/6, 0/6, 3/6, 0/6],
])Simpler
- Trying to make this easier to see and think about.
Learning
My claim
- I claim this is not intelligent, though it can do an intelligent task (vision).
multi = np.array([
[-1/1, -1/1, -1/1, -1/1, 1/1, -1/1, -1/1, -1/1, -1/1],
[-1/2, -1/2, 1/2, -1/2, -1/2, -1/2, 1/2, -1/2, -1/2],
[-1/3, -1/3, 1/3, -1/3, 1/3, -1/3, 1/3, -1/3, -1/3],
[ 1/4, -1/4, 1/4, -1/4, -1/4, -1/4, 1/4, -1/4, 1/4],
[ 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5],
[ 0/6, 3/6, 0/6, 3/6, -6/6, 3/6, 0/6, 3/6, 0/6],
])- Rather, I am intelligent, since I made it.
My claim
- I do, however, claim, that this is pretty close to being intelligent:
- As a method, it could theoretically recognize anything.
- Art
- Love
- Beauty
- True
Generalize
- The trouble is, how do we get the values in the matrix?
- Somehow, to go from “just arithmetic” to intelligence, we need to learn.
- This is where machine learning comes into artificial intelligence.
Setup
Our Goals
- We will:
- Start with “nothing”
- An arbitrary matrix with random values.
- Construct some learning process
- We will represent this with Colab code, but the ideas are independent of how we write them.
- Show that we can recognize dice.
- And therefore (at least theoretically) anything
- Start with “nothing”
Example
- We started with this:
multi = np.array([
[-1/1, -1/1, -1/1, -1/1, 1/1, -1/1, -1/1, -1/1, -1/1],
[-1/2, -1/2, 1/2, -1/2, -1/2, -1/2, 1/2, -1/2, -1/2],
[-1/3, -1/3, 1/3, -1/3, 1/3, -1/3, 1/3, -1/3, -1/3],
[ 1/4, -1/4, 1/4, -1/4, -1/4, -1/4, 1/4, -1/4, 1/4],
[ 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5, -1/5, 1/5],
[ 0/6, 3/6, 0/6, 3/6, -6/6, 3/6, 0/6, 3/6, 0/6],
])Size
- As discussed, to look at:
- 9 locations, and produce
- 6 possible characterizations
- We used a \(9 \times 6\) matrix.
Randomize
- Rather than using our own intelligence to create the matrix…
- …we can randomize everything.
- Imagine, say, when we first try to understand something.
- About the same as random guessing.
- We use
np.random.randand tell it what shape we want.
View it
- It looks like this:
- All between
0and1by default.
- All between
[[0.13819755 0.858514 0.89387389 0.97953215 0.95078702 0.91339183
0.55505456 0.05400497 0.76328784]
[0.61992175 0.60798114 0.47795787 0.99881325 0.21490773 0.1434384
0.24203858 0.96896522 0.44601858]
[0.32932856 0.81046388 0.98930022 0.72729313 0.19500454 0.34640125
0.48504912 0.72482006 0.13982303]
[0.86507991 0.15626278 0.73353581 0.0778394 0.23396891 0.53146514
0.62610878 0.30177633 0.65322222]
[0.30151646 0.17771519 0.31744043 0.70558929 0.92332405 0.86750924
0.06348737 0.10571363 0.8207625 ]
[0.18535997 0.81312843 0.86835174 0.80318644 0.5531322 0.4545896
0.96877004 0.93137333 0.08946593]]
Check it out!
- Mine:
What happened?
- Well, we multiplied each of the nine possible places for a dot in a die by something.
- Actually, we did this six times - one for each possible classification
- We classify a die as a “one” or a “three”
- For each of these six classes, we summed up the nine products
- The presence of dot multiplied by importance of dot, for all dots.
Examine the results
- For each prediction, we can determine which row (or column) of the matrix made that prediction.
- If that prediction is correct, we can “strengthen” the connections.
- Perhaps increase them by 10% or by
.1or something.
- Perhaps increase them by 10% or by
- If the prediction is incorrect, we can “penalize” the connections.
- Perhaps decrease them by 10% or by
.1or something.
- Perhaps decrease them by 10% or by
“Supervised” Learning
- In this case, we engage in supervised learning.
- We know the answer.
- We can compare against the answer.
- This is the answer:
The key
- I make an answer key quickly.
array([[ True, False, False, False, False, False],
[False, True, False, False, False, False],
[False, False, True, False, False, False],
[False, False, False, True, False, False],
[False, False, False, False, True, False],
[False, False, False, False, False, True]])
- We “transpose” the key to make the dimensions line up.
@is the special NumPy “operator” for matrix multiplication.- Multiple rows times columns, then sum them up.
Compare
- We can compare the answer key to the current results from random guessing.
- We use
np.equalto compare all of the matrix positions to see if they are equal.
array([[False, True, True, True, True, True],
[False, False, False, False, True, False],
[False, True, True, False, False, False],
[False, False, False, True, False, False],
[False, False, False, False, True, False],
[False, False, False, False, False, True]])
- We want all of these to be true - the learned matrix should be just as good as our own intelligence!
Insight
- We note that is often the case that there are many correct guesses for one.
- After all, it is unlikely that the randomization produces values high enough to sum to one given a single dot.
- We note that is often the case that there are many incorrect guesses for six.
- After all, it is unlikely that the randomization produces values high enough to sum to six given a single dot.
Problem
- Just for this class and to make things easier, we modify the dice so they all sum to one.
- So,
six, which previously had six dots multiplied by one, now has 6 dots mulitiplied by \(\frac{1}{6}\) each.
- So,
- We started with this:
Solution
- I just want to divide each die by how many dots it has.
- I frustrating have to transpose there and back to get the multiplication to work…
array([[0. , 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ],
[0. , 0. , 0.5 , 0. , 0. ,
0. , 0.5 , 0. , 0. ],
[0. , 0. , 0.33333333, 0. , 0.33333333,
0. , 0.33333333, 0. , 0. ],
[0.25 , 0. , 0.25 , 0. , 0. ,
0. , 0.25 , 0. , 0.25 ],
[0.2 , 0. , 0.2 , 0. , 0.2 ,
0. , 0.2 , 0. , 0.2 ],
[0.16666667, 0. , 0.16666667, 0.16666667, 0. ,
0.16666667, 0.16666667, 0. , 0.16666667]])
Check it now
- These should make the random guessing more, well, random…
Learning
We’re ready
- Now we are ready to go from nothing (random guessing) to something (intelligence)
- We will go over every classification.
- If it is the same as the answer key, we will “reward” the row - increasing its weights.
- If it differs, we will decrease the weights.
- I will do this by 10%.
Writing it out
- No real way around using code for this as far as I know.
- First, we’ll make a copy of our original random thing to compare against.
- Then we’ll loop - using
for- over all the classifications and update accordingly.
Steps
- Check the answer.
- Print them for now, to see them.
Steps
- Check the answer.
- Loop over rows
- Print them for now, to see them.
- The first row shows how the “one” die is classified.
- The first entry in each row shows whether a die is classified as a “one”
Steps
- Check the answer.
- Loop over rows
- Loop over correctnesses.
Steps
- Check the answer.
- Loop over rows
- Loop over correctnesses.
- If correct…
- Increase… something?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
What goes here?
We know
- If a guess in a certain row is correct:
- It was predicted by something.
- That something was the random guesses.
- But not all of the random guesses…
- Just values that are multiplied to get that row.
- So we want to increase (or decrease) the row at the same address.
- How?
Enumerate
- We can use the special
enumeratefunction to get back a row and its location with in the matrix.
Location is 0 and row is [False True True True True True]
Location is 1 and row is [ True False True True True True]
Location is 2 and row is [ True True False True True True]
Location is 3 and row is [ True True True False True True]
Location is 4 and row is [ True True True True False True]
Location is 5 and row is [ True True True True True False]
Enumerate
- We can use the special
[]notation to look up the part of the learning matrix that corresponds to the same row.
Location is 0 and row is [False True True True True True] and weights are [0.13819755 0.858514 0.89387389 0.97953215 0.95078702 0.91339183
0.55505456 0.05400497 0.76328784]
Location is 1 and row is [ True False True True True True] and weights are [0.61992175 0.60798114 0.47795787 0.99881325 0.21490773 0.1434384
0.24203858 0.96896522 0.44601858]
Location is 2 and row is [ True True False True True True] and weights are [0.32932856 0.81046388 0.98930022 0.72729313 0.19500454 0.34640125
0.48504912 0.72482006 0.13982303]
Location is 3 and row is [ True True True False True True] and weights are [0.86507991 0.15626278 0.73353581 0.0778394 0.23396891 0.53146514
0.62610878 0.30177633 0.65322222]
Location is 4 and row is [ True True True True False True] and weights are [0.30151646 0.17771519 0.31744043 0.70558929 0.92332405 0.86750924
0.06348737 0.10571363 0.8207625 ]
Location is 5 and row is [ True True True True True False] and weights are [0.18535997 0.81312843 0.86835174 0.80318644 0.5531322 0.4545896
0.96877004 0.93137333 0.08946593]
Steps
Check it out
- Old (“
backup”)
- New (“
learn”)
Not much better
- We should do more.
- But first, we begin again from our backup.
- We don’t want to mix different experiments together.
Steps
- Check the answer.
- Loop over rows
- Loop over correctnesses.
- If correct…
- Increase that row.
- Otherwise…
- Decrease that row.
Better?
array([[1, 0, 0, 0, 1, 0],
[1, 0, 1, 0, 0, 1],
[1, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0]])
- Recall we want a diagonal of
1here. - (I use
+ 0to turn into numbers so the array is smaller and easier to read) - (
True + 0is1andFalse + 0is0- don’t worry about it) - Doesn’t look intelligent to me.
We recognize
- We don’t need to increase or decrease entire rows
- Rather, we know which of the nine dots has a nonzero value…
- And therefore which of the nine weights is contributing to a classification.
- So, rather than increase a row by 10%, increase or decrease the relevant weights only.
Steps
- Check the answer.
- Loop over rows
- Loop over correctnesses.
- If correct…
- Increase relevant.
- Otherwise…
- Decrease relevant.
array([[1, 0, 1, 0, 1, 0],
[0, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1]])
- Recall we want a diagonal of
1here.
One Problem
- We may guess correctly too often.
- Given two equally likely choices, 50% chance.
- But we only want one classification of six - ~17% chance.
- We can bias against guessing by… increasing the bias!
Looking good?
- This looked awfully good to me.
array([[1, 0, 1, 0, 1, 0],
[0, 1, 1, 1, 1, 1],
[0, 0, 1, 0, 1, 0],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]])
- If this was an exam the percentage would be a…
Compare
- Initially…
- Wait… isn’t that worse?
Precision and Recall
- Precision and recall are then defined as:
\[ \begin{align} \text{Precision} &= \frac{tp}{tp + fp} \\ \text{Recall} &= \frac{tp}{tp + fn} \, \end{align} \]
- Precision is the number of true positives among the predicted positives.
- Recall is the number of true positives among the actual positives.
True Positive Detection
- Initially…
- Count along the diagonal.
- We go from zero to all true positives.
How?
- Initially, randomization simply predicted no classes at all for anything.
array([[False, False, False, False, False, False],
[False, False, False, False, False, False],
[False, False, False, False, False, False],
[False, False, False, False, False, False],
[False, False, False, False, False, False],
[False, False, False, False, False, False]])
- Now, at least, we capture the diagonal and aren’t too far off from here.
More to Come
- I think today was a lot.
- We will play around with this a bit in the lab then return next week.
- To reduce false positives
FIN
- Bias is not enough!
