The XOR Problem

AI 101

Today

  • Thus far we have:
    • Created a matrix that can do a single intelligent task.
      • NOT general intelligence
    • Shown how to use random processes to create that matrix.
      • Not perfected this, by shown the ability to improve on random.

Motivation

Re-evaluating Our Model

  • We have been using a single-layer approach.
    • What does this mean?
    • We have a single “layer” of thinking neurons between the “sensory neurons” and result.

SudokuBipartite cluster_cells C22 (2,2) D06 6 C22--D06 D05 5 C22--D05 D04 4 C22--D04 D03 3 C22--D03 D02 2 C22--D02 D01 1 C22--D01 C21 (2,1) C21--D06 C21--D05 C21--D04 C21--D03 C21--D02 C21--D01 C20 (2,0) C20--D06 C20--D05 C20--D04 C20--D03 C20--D02 C20--D01 C12 (1,2) C12--D06 C12--D05 C12--D04 C12--D03 C12--D02 C12--D01 C11 (1,1) C11--D06 C11--D05 C11--D04 C11--D03 C11--D02 C11--D01 C10 (1,0) C10--D06 C10--D05 C10--D04 C10--D03 C10--D02 C10--D01 C02 (0,2) C02--D06 C02--D05 C02--D04 C02--D03 C02--D02 C02--D01 C01 (0,1) C01--D06 C01--D05 C01--D04 C01--D03 C01--D02 C01--D01 C00 (0,0) C00--D06 C00--D05 C00--D04 C00--D03 C00--D02 C00--D01

This… works

  • We have shown that there are possible matrix solutions that classify correctly in all cases using this single layer.
    • Given one (1) assumption.
  • Recall this solution:
import numpy as np
multi = np.array([
    [-1/1, -1/1, -1/1, -1/1,  1/1, -1/1, -1/1, -1/1, -1/1],
    [-1/2, -1/2,  1/2, -1/2, -1/2, -1/2,  1/2, -1/2, -1/2],
    [-1/3, -1/3,  1/3, -1/3,  1/3, -1/3,  1/3, -1/3, -1/3],
    [ 1/4, -1/4,  1/4, -1/4, -1/4, -1/4,  1/4, -1/4,  1/4],
    [ 1/5, -1/5,  1/5, -1/5,  1/5, -1/5,  1/5, -1/5,  1/5],
    [ 0/6,  3/6,  0/6,  3/6, -6/6,  3/6,  0/6,  3/6,  0/6],
])

Big Assumption

  • We made a big, I’d argue unreasonable, assumption.
  • Three must always look like this:
    • [0,0,1,0,1,0,1,0,0]
  • Never like this:
    • [1,0,0,0,1,0,0,0,1]

Encoding the Number 3

0 0 1
0 1 0
1 0 0

Why?

  • Is it not equally correct to have that die rotated 90°?
  • Is that not still a die showing a three?

Re-encoding the Number 3

1 0 0
0 1 0
0 0 1

Try it!

  • We can take a look at how well we classify this value.
  • First, we make both “top left” and “top right” three.
rite = [0,0,1,0,1,0,1,0,0]
left = [1,0,0,0,1,0,0,0,1]

Aside: Matrix Review

Classify it!

  • As with anything else, we can take our trusty “multi-classifier”.
multi.shape
(6, 9)
  • As a \(6 \times 9\) matrix, we can use it to take something of size “9” - like the 9 possible dots on a die - and get something of size 6 - like the six possible values of a die.

Matrix Multiply

  • To perform a multiplication, we take our die and multiply it by each internal sub-vector of size 9 of the matrix.
    • Make sure you understand that sentence.

I could…

  • It is possible to just take a each row, multiply by the die, and see the result.
for row in multi:
    print(rite * row)
[-0. -0. -1. -0.  1. -0. -1. -0. -0.]
[-0.  -0.   0.5 -0.  -0.5 -0.   0.5 -0.  -0. ]
[-0.         -0.          0.33333333 -0.          0.33333333 -0.
  0.33333333 -0.         -0.        ]
[ 0.   -0.    0.25 -0.   -0.25 -0.    0.25 -0.    0.  ]
[ 0.  -0.   0.2 -0.   0.2 -0.   0.2 -0.   0. ]
[ 0.  0.  0.  0. -1.  0.  0.  0.  0.]

I could…

  • We could then sum up each row…
for row in multi:
    print(sum(rite * row))
-1.0
0.5
1.0
0.25
0.6000000000000001
-1.0

I could…

  • We could then compare the sum to 1
for row in multi:
    print(1 <= sum(rite * row))
False
False
True
False
False
False

Transpose

  • We do not have to use a for loop
    • This is a simple case of matrix multiplication
  • Matrix multiplication
    • Multiplies rows of the first matrix by the columns of the next.
    • Sums the rows.
    • Outputs the sum of the row in the relevant position in a vector.

Worked Example

  • Here is an example…
  • I hastily wrote multi as a \(6 \times 9\)
multi
array([[-1.        , -1.        , -1.        , -1.        ,  1.        ,
        -1.        , -1.        , -1.        , -1.        ],
       [-0.5       , -0.5       ,  0.5       , -0.5       , -0.5       ,
        -0.5       ,  0.5       , -0.5       , -0.5       ],
       [-0.33333333, -0.33333333,  0.33333333, -0.33333333,  0.33333333,
        -0.33333333,  0.33333333, -0.33333333, -0.33333333],
       [ 0.25      , -0.25      ,  0.25      , -0.25      , -0.25      ,
        -0.25      ,  0.25      , -0.25      ,  0.25      ],
       [ 0.2       , -0.2       ,  0.2       , -0.2       ,  0.2       ,
        -0.2       ,  0.2       , -0.2       ,  0.2       ],
       [ 0.        ,  0.5       ,  0.        ,  0.5       , -1.        ,
         0.5       ,  0.        ,  0.5       ,  0.        ]])
  • So the rows are of length 9.

Transpose

  • …so I need to rotate (transpose) it.
  • So the columns are of length 9.
multi.transpose()
array([[-1.        , -0.5       , -0.33333333,  0.25      ,  0.2       ,
         0.        ],
       [-1.        , -0.5       , -0.33333333, -0.25      , -0.2       ,
         0.5       ],
       [-1.        ,  0.5       ,  0.33333333,  0.25      ,  0.2       ,
         0.        ],
       [-1.        , -0.5       , -0.33333333, -0.25      , -0.2       ,
         0.5       ],
       [ 1.        , -0.5       ,  0.33333333, -0.25      ,  0.2       ,
        -1.        ],
       [-1.        , -0.5       , -0.33333333, -0.25      , -0.2       ,
         0.5       ],
       [-1.        ,  0.5       ,  0.33333333,  0.25      ,  0.2       ,
         0.        ],
       [-1.        , -0.5       , -0.33333333, -0.25      , -0.2       ,
         0.5       ],
       [-1.        , -0.5       , -0.33333333,  0.25      ,  0.2       ,
         0.        ]])
  • Transpose has a () at the end because it is an action (a verb)

Multiply

  • Then, I can use @ to “matrix multiply” the dice times the classifier!
rite @ multi.transpose()
array([-1.  ,  0.5 ,  1.  ,  0.25,  0.6 , -1.  ])

The Bias

  • To determine if this is enough for a neuron to fire, I still need to include the bias.
    • This isn’t part of matrix multiplication!
  • So we do so, same as with the other method using sum and for
1 <= rite @ multi.transpose()
array([False, False,  True, False, False, False])

The problem

  • This classifier only works for the “top right” version of three!
1 <= left @ multi.transpose()
array([False, False, False, False, False, False])
  • Even though that is definitely a three!
sum(left)
3

Interactions

Our method…

  • … works well for detecting a single dot.
  • But what happens when dots interact?
  • We encounter a limit of our current logic.

Two vs. Four

  • Let’s look at the two and four dice.
  • Both have dots in either the top-left or top-right.
  • The four simply has dots in both the top-left and bottom-right.

Three vs. Five

  • Both have a center dot and a diagonal.
  • Both have dots in either the top-left or top-right.
  • The five simply has dots in both the top-left and bottom-right.

The Interaction Problem

  • Our matrix multiplication is a linear combination.
  • It sums up evidence: \(y = \sum w_i x_i\).
  • To tell a 2 from a 4, we need more than a sum.
    • 4 will always sum more than 2, unless we restrict 2 to a single orientation
  • We need to know if dots exist exclusively.

A Minimal Example

  • Let’s focus on just two positions on the die.
  • Position A: Top-Left dot (\(x_1\)).
  • Position B: Top-Right dot (\(x_2\)).
  • Can we “calculate” a specific relationship?

The Four Scenarios

  • We take \((x_1, x_2)\) to be…
    • Scenario 1: No dots are present (0, 0).
    • Scenario 2: Only Top-Left is present (1, 0).
    • Scenario 3: Only Top-Right is present (0, 1).
    • Scenario 4: Both dots are present (1, 1).

Defining XOR

  • XOR stands for Exclusive OR.
  • Pronounced as “ex-or” or “ZOR”.
  • It means: “Either A or B, but not both.”
  • It is a fundamental technique in logic and computing.
    • Not so much the English language!
    • English tends to use “or” for “xor” and nothing for “logical or”, which is either of “and” and “xor”.

XOR vs. OR

  • In a standard (logical) OR, (1, 1) results in True.
    • Not “coke or pepsi” (restaurant only has one)
    • Perhaps “delicious or filling” (wouldn’t reject a dish that is both)
  • In an XOR, (1, 1) results in False.
    • Perhaps, when ordering sides, “fries or tots”
      • Both would cost extra and is therefore banned.
  • This “reversal” is what breaks simple models.

XOR in Real Life

  • Dairy Choice: Choose milk or soy, but not both.
  • Enrollment: You can “Enroll” or “Drop,” not both.
  • Car “Status”: Engine is either running or “off”
  • Logic depends on specific, exclusive combinations.

Coding the Problem

  • Let’s try to build an XOR dataset in NumPy.
  • We want to see if our perceptron can solve it.
    • We want to regard this as only trying to tell “small” (2 or 3) from “big” (4 or 5) numbers by looking at the top outer two dots.
  • The “x” in XOR stands for “exclusive or”
  • I use “i” for “inclusive or” to not just say vanilla “or”
x = np.array([[0,0], [0,1], [1,0], [1,1]])
y_xor = np.array([0, 1, 1, 0])
y_ior = np.array([0, 1, 1, 1])

Linearity

The Geometry of Logic

  • Imagine a 2D plot of these points.
  • X-axis: Top-Left dot (\(x_1\)).
  • Y-axis: Top-Right dot (\(x_2\)).
  • Let’s visualize the “IOR” logic first.

Meaning

  • We want to place a line somewhere through this 2D space.
    • On one side of the line, the neuron “activates”.
    • On the other, it does not.
  • IOR is relatively simple (the 4/5 case) - we only want to activate if we see both dots.

Place dots

Draw some lines

  • Any upward line either:
    • Doesn’t capture all 2s or 3s, or
    • Captures all 2s or 3s but also captures all 4s or 5s
    • It can be wrong in both ways at once!
  • Some examples

Missing some 2s/3s

Getting the 4s/5s

Both Bad Things

Takeaway

  • This is no possible way for a single-layer perceptron to differentiate 2s/3s from 4s/5s.
  • On these graphs, the “intercept” represents the bias.
  • On these graphs, the “slope” represents (the sume of) the weights

Impact on Dice

  • To distinguish a 2 from a 4 based on these dots:
  • We need to know if the extra dots are absent.
  • Our current model only knows how to add weight.
  • It doesn’t know how to handle “exclusive” patterns.

Linear Combinations

  • Our current math looks like this:
  • \(Output = Weights \cdot Inputs + Bias\)
  • This is a linear transformation.
  • It can only create “flat” decision boundaries.

Layers

The Solution?

  • To solve XOR, we need a more powerful technique
  • We will:
    • Create something not unlike a perceptron which “projects” these two dots into a different “space”
    • Apply that transformation to the incoming (visual) data.
    • Apply a perceptron to the transformed data.
  • These are two perceptron “layers”

Visualizing Layers

  • Think of the first layer as “feature detectors.”
  • One neuron detects “At least one dot.”
  • Another neuron detects “Both dots.”
  • The final layer combines these detections.

Designing the Logic

  • XOR can be thought of as:
  • (A IOR B) AND NOT (A AND B).
  • This requires a sequence of operations.
  • Sequence = Depth in a neural network.

Minimal Example

  • We consider only “corners in the top row”.
  • These ones in red, basically:
  • We will make a “perceptron” that takes only two inputs.

Data

  • We recall:
print(x)
[[0 0]
 [0 1]
 [1 0]
 [1 1]]
print(y_ior)
[0 1 1 1]
print(y_xor)
[0 1 1 0]
  • We want to take a vector of size 2 (top two dots) and produce a vecotr of size 2 (IOR or XOR).
    • Only XOR is interesting.

Our First Steps

  • First, let’s naively just sum up the number of dots.
  • We can do this simply with a single layer that sets all weights to 1.

SudokuBipartite cluster_cells RITE Top Rite DEST Neuron RITE--DEST LEFT Top Left LEFT--DEST

Our First Steps

  • We can have neurons fire for sums of at least 1 or at least 2.
  • Green for “positive” weights (greater than zero)

SudokuBipartite cluster_cells RITE Top Rite ONE 1 RITE--ONE TWO 2 RITE--TWO LEFT Top Left LEFT--ONE LEFT--TWO

Setting Weights

  • Apply high weight to edges into 1
  • Apply half weight to edges into 2

SudokuBipartite cluster_cells RITE Top Rite ONE 1 RITE--ONE TWO 2 RITE--TWO LEFT Top Left LEFT--ONE LEFT--TWO

Adding a Layer

  • We keep this stage, but add a layer below.
  • We only connect top-to-middle and middle-to-bottom

SudokuBipartite cluster_cells RITE Top Rite ONE 1 RITE--ONE TWO 2 RITE--TWO LEFT Top Left LEFT--ONE LEFT--TWO IOR IOR ONE--IOR XOR XOR ONE--XOR TWO--IOR TWO--XOR

Setting weights again

  • We add a negative (red) weight from “2” to “XOR”
    • We don’t want to “activate” if both are set

SudokuBipartite cluster_cells RITE Top Rite ONE 1 RITE--ONE TWO 2 RITE--TWO LEFT Top Left LEFT--ONE LEFT--TWO IOR IOR ONE--IOR XOR XOR ONE--XOR TWO--IOR TWO--XOR

As a matrix

  • We can easily make a matrix!
  • … or two.
top_layer = np.array([
    [ 1.0,  1.0],
    [ 0.5,  0.5],
])
bot_layer = np.array([
    [ 1.0,  1.0],
    [ 1.0, -1.0],
])

Try it

  • We can apply the first layer.
np.array([0,1]) @ top_layer
array([0.5, 0.5])

Transpose

  • Whoops! We have to transponse to use @
np.array([0,1]) @ top_layer.transpose()
array([1. , 0.5])

Activate

  • We can compare to 1 (or some other bias) to determine activation.
1 <= np.array([0,1]) @ top_layer.transpose()
array([ True, False])

Next Layer

  • We can then multiply this intermediate result int by the next layer.
(1 <= np.array([0,1]) @ top_layer.transpose()) @ bot_layer.transpose()
array([1., 1.])

Activate Again

  • And we can compare that to activation.
1 <= (1 <= np.array([0,1]) @ top_layer.transpose()) @ bot_layer.transpose()
array([ True,  True])
  • Is this what we would expect?
    • Yes!
    • There is exactly one dot.
    • There is at least one dot.

Test ’em all

  • Check this out:
for pair in x:
    print(pair, 1 <= (1 <= np.array(pair) @ top_layer.transpose()) @ bot_layer.transpose())
[0 0] [False False]
[0 1] [ True  True]
[1 0] [ True  True]
[1 1] [ True False]
  • We can tell 2s/3s (middle) from 4s/5s (bottom)!

Summary

What we learned

  • You can’t do everything with a single matrix.
  • It seems an awful lot like you can do anything by stacking them.
  • Stacking isn’t too bad:
    • Multiplty, then
    • Check activations.

Fin