Markov

AI 101

Today

Thus far we have:
- Used multi-layer networks to do tasks of (potentially) arbitrary complexity.
- Only looked at vision
- Now, we look at text generation.
These used to be called “Markov chains”, so I named this “Markov”
- I don’t know who “Markov” is.

Recap

My Lab Solution

import numpy as np
dice = np.array([
    [0,0,0,0,1,0,0,0,0],
    [0,0,1,0,0,0,1,0,0],
    [1,0,0,0,0,0,0,0,1],
    [0,0,1,0,1,0,1,0,0],
    [1,0,0,0,1,0,0,0,1],
    [1,0,1,0,0,0,1,0,1],
    [1,0,1,0,1,0,1,0,1],
    [1,0,1,1,0,1,1,0,1]
])
compress = dice[:,:5]

top = np.array([
       [-1.0,  3.0, -1.0,  3.0,  1.0], # 1 ior 6
       [ 1.0, -2.0,  1.0, -2.0,  0.0], # 2, 3, 4, or 5
       [ 0.5, -2.0,  0.5, -2.0,  0.0], # 4 ior 5
       [ 0.0,  0.0,  0.0,  0.0,  1.0], # Odd
])
top = top.transpose()

bot = np.array([
       [ 0.5, -1.0, -1.0,  0.5], # 1
       [-1.0,  1.0, -1.0, -1.0], # 2
       [-1.0,  0.5, -1.0,  0.5], # 3
       [-1.0,  0.5,  0.5, -1.0], # 4
       [-1.0,  0.4,  0.4,  0.4], # 5
       [ 1.0, -1.0, -1.0, -1.0], # 6
])
bot = bot.transpose()

for die in compress:
    print(1 <= (1 <= die @ top) @ bot)

[ True False False False False False]
[False  True False False False False]
[False  True False False False False]
[False False  True False False False]
[False False  True False False False]
[False False False  True False False]
[False False False False  True False]
[False False False False False  True]

Motivation

Vision

I personally have a soft spot for computer vision.
- I think it’s quality is quite high.
  - I think computer vision frameworks are better at recognizing images than humans.
  - I believe image generation (when done properly) is completely indistinguishable from photography, even to experts.

Text

I could not, realistically, how a lower opinion of large language models, like ChatGPT or Gemini
- I basically don’t think they can do anything “worth doing”
  - They can, for example, write emails that aren’t worth sending.
- For some reason, they have “broken out” of computer science into broader cultural relevance.
  - In a way that vision frameworks have not.

Nevertheless!

The raison d’être for this course is that “the powers that be” now think AI is a big deal because of LLMs.
So, let’s generate some text.

Differences

Like with seeing dice, we need to somehow package up words into neurons.
Like with predicting numbers, we need to somehow back up… also words into neurons, again?
- LLMs go from words-to-words, vision goes from images-to-words.

Setting the stage

ImageNet, the big vision thing, came out in 2006.
The LLM was developed in 2017
ChatGPT became “prominent” in 2023

Looking at Google

We’ll do Google relative - they competed in ImageNet, developed the LLM, and have a pretty major one (Gemini).
How big is Google?
- In 2006, 122 billion USD market cap
- In 2017, 729 billion USD market cap
- In 2026, 3680 billion USD market cap
Why does this matter?

Scaling

There are a lot of words.
- And there are more words in a sentence than there are in general.
- For example, that previous line contained “in” twice.
- Basically, more than 9 inputs (for dice).
The thing that held up LLMs was not having enough computing power to encode all words.

Our example

We’ll use an extremely restricted example, inspired by a meme I saw once.

Punch into Colab

I just typed these into Colab.
- I used all lower case and no punctuation (and change “not” to “no” for length)
- This is a similar “simplifying assumption” as regarded dots as only present or absent.

sar = "to do is to be"
soc = "to be is to do"
sha = "to be or no to be"
sin = "do be do be do"
all = [sar, soc, sha, sin]

Tokenizing

There’s a core insight in computational linguistics called “tokenization”.
Somehow, have to break up words into things that can be recognized by a “sensory neuron”.
I treat individual words as tokens.
- This could be an entire class.

words = [quote.split() for quote in all]
print(words)

[['to', 'do', 'is', 'to', 'be'], ['to', 'be', 'is', 'to', 'do'], ['to', 'be', 'or', 'no', 'to', 'be'], ['do', 'be', 'do', 'be', 'do']]

Sets

I want to see how many unique words there are.
- Obviously, “to” appears many times.
I use a “set”, a mathematical object that is a (1) collection with (2) no duplicates.
- They also aren’t ordered, which doesn’t really matter here.

sets = [set(quote) for quote in words]
print(sets)

[{'is', 'be', 'to', 'do'}, {'is', 'be', 'to', 'do'}, {'be', 'to', 'or', 'no'}, {'be', 'do'}]

Unique words

How many words are there across all three phrases?
I take the union of sets, which is all elements in at least one set.

unique = set.union(*sets)
print(unique)

{'is', 'be', 'do', 'or', 'to', 'no'}

Only 6 words! Not too bad!
- Only one more than our compressed dice.

Making a Network

We can make something that looks an awful lot like a perceptron!

Edge Weights

How do we determine edge weights?
Or perhaps, what are we trying to do?
Let’s imagine what our task is:
- We want to generate text given some text, so…
- Given a word, produce the next word.

Words-to-words

So, what do we do?
- Let’s take a look at our “quotes”.
- Let’s see which words follow which other words.
  - Perhaps at which probability.
- Let’s plug those in as edge weights.

First things first

Let’s just look at one word - the first word we see.

words

[['to', 'do', 'is', 'to', 'be'],
 ['to', 'be', 'is', 'to', 'do'],
 ['to', 'be', 'or', 'no', 'to', 'be'],
 ['do', 'be', 'do', 'be', 'do']]

Okay, that word is “to”.

Second things second

What words can follow “to”.
- Well, it looks to me like “do” and “be”
- I can write some code to make sure.
- The point of this class isn’t writing that code, but I will show it for the interested student!

Finding what’s next

I loop over all quotes.
- I loop over all words in the quotes.
  - If the current word is “to”, I save the next word.
I look at all the next words.

next = []
for quote in words:
    for loc, word in enumerate(quote):
        if word == "to":
            next.append(quote[loc + 1])
print(next)

['do', 'be', 'be', 'do', 'be', 'be']

Do this for all words

We aren’t restricted for doing this just for “to”
Do it for all the words!
- We have to add one special case.
- We check quote length to make sure there is a next word.

nexts = []
for first in unique:
    next = []
    for quote in words:
        for loc, word in enumerate(quote):
            if word == first and loc + 1 < len(quote):
                next.append(quote[loc + 1])
    nexts.append(next)

Let’s see it

print(nexts)

[['to', 'to'], ['is', 'or', 'do', 'do'], ['is', 'be', 'be'], ['no'], ['do', 'be', 'be', 'do', 'be', 'be'], ['to']]

Oh… that is tough to understand.
- We’ll use another coding thing that I should mention.
- I think it will be important to AI in the future, not seeing much usage now.

Key-Value Storage

A dictionary

What we really need is something a lot like a dictionary.
- Rather just have a list of lists of words, we want a list of words for each starting word.
- Dictionaries also contain “lists of words” (definitions) for each word.
In computing, we term this “key-value storage”.
- Keys are words
- Values are definitions.

Colab has dictionaries

There so happens to be something called a dictionary (well, a dict) I can use.

d = dict()
print(d)

{}

Adding keys

We add things to dictionary as key-value pairs
- We take the name of the dictionary, like my_values
- We add box brackets []
- Within those brackets, we give the key (e.g. the word for which we are storing the definition
- We use single-equals assignment to set the value of the key within the dictionary.
Like this:

d["to"] = ['do', 'be', 'be', 'do', 'be', 'be']

Seeing values

It is easy enough to see a value from here.
Same as setting a value, just without single-equals assignment.
- We take the name of the dictionary, like my_values
- We add box brackets []
- Within those brackets, we give the key (e.g. the word for which we are storing the definition

print(d["to"])

['do', 'be', 'be', 'do', 'be', 'be']

Back To Work

Use A `dict`

Same as before, but we use a dictionary.
- The key is “before” or “first” word.
- The value is the “next” or “second” word.

nexts = dict()
for first in unique:
    next = []
    for quote in words:
        for loc, word in enumerate(quote):
            if word == first and loc + 1 < len(quote):
                next.append(quote[loc + 1])
    nexts[first] = next

Easier to see

Recall - we are doing this to find edge weights

print(nexts)

{'is': ['to', 'to'], 'be': ['is', 'or', 'do', 'do'], 'do': ['is', 'be', 'be'], 'or': ['no'], 'to': ['do', 'be', 'be', 'do', 'be', 'be'], 'no': ['to']}

A bit easier visually with some formatting.

for first in nexts:
    print(first, nexts[first])

is ['to', 'to']
be ['is', 'or', 'do', 'do']
do ['is', 'be', 'be']
or ['no']
to ['do', 'be', 'be', 'do', 'be', 'be']
no ['to']

Takeaways

for first in nexts:
    print(first, nexts[first])

is ['to', 'to']
be ['is', 'or', 'do', 'do']
do ['is', 'be', 'be']
or ['no']
to ['do', 'be', 'be', 'do', 'be', 'be']
no ['to']

“no” only precedes “to”.
“or” only precedes “no”
“is” only precedes “to”
Others are more complex.

Early sketch

“no” only precedes “to”.
“or” only precedes “no”
“is” only precedes “to”

Meaning of weights

We have a problem now.
Historically, we have used determinism
- Every single time we see a six-sided die, we say it represents a six.
Now we must use non-determinism
- Sometimes when we see a “to”, it is followed by “be”
- Sometimes it is followed by “do”.
- How do we handle this?

Calculate Weights

Let’s take an edge weight to be the probability.
“to” will have an edge of weight \(\frac{2}{3}\) going to “be”.

Let’s look at “be”

The way I think of this is:
- After a “be”, at 25% probability see an “is”
- After a “be”, at 25% probability see an “or”
- After a “be”, at 50% probability see an “do”

Generating Text

So, if we want to generate text.
- We look at the current word.
- We flip a coin or roll a die.
For “be” we use a four-sided die.
- And two numbers correspond to “do”.

Rolling Die in Colab

We have already seen random numbers, when setting up our perceptron.
It is easy enough to.
- Generate a random number between zero and one.
- Multiply it by the number of “next words”
- Round or truncate to a whole number
- Look up the number in that position, which is now the next number.

In Colab

Here’s the code - don’t worry about understanding it unless you want.
- nexts is the dictionary we made earlier.

def next_word(prev_word):
    die = np.random.rand()
    possible_words = nexts[prev_word]
    position = die * len(possible_words)
    position = int(position)
    return possible_words[position]

Try it out

We’ll ask for the next word several times and hope to get different answers.

next_word("be")

'or'

next_word("be")

'do'

next_word("be")

'is'

next_word("be")

'do'

next_word("be")

'do'

Generating text

We can now use a neural network to generate text!
We just provide a starting word.
- We will also provide a length, though there’s ways around that!

word = "to"
print(word)
for _ in range(4):
    word = next_word(word)
    print(word)

to
be
do
be
is

Clean it up

Rather than printing each word, we’ll add words together into one big thing to print.
We’ll also put all the code in a function.
We’ll allow the function to accept a starting word and a length.

def make_text(first_word, num_words):
    text = first_word
    word = first_word
    for _ in range(num_words - 1):
        word = next_word(word)
        text = text + " " + word
    print(text)

Some examples

make_text("to", 5)

to be or no to

make_text("to", 5)

to be do be do

make_text("to", 5)

to do be is to

make_text("to", 5)

to do is to be

make_text("to", 5)

to be do is to

make_text("to", 5)

to be do be or

make_text("to", 5)

to do be is to

More examples

Longer.

make_text("to", 10)

to be do be is to do be or no

make_text("to", 10)

to be do be do is to be do is

Start with other words.

make_text("be", 5)

be is to be is

make_text("no", 5)

no to be do is

make_text("is", 5)

is to be do is

Bonuses

If Time

We can look at start and stop tokens.
We can discuss attention.

Summary

What we learned

You can generate text with neural networks.
We used a single layer, but of course…
- It seems an awful lot like you can do anything by stacking them.

Markov

Today

Recap

My Lab Solution

Motivation

Vision

Text

Nevertheless!

Differences

Setting the stage

Looking at Google

Scaling

Our example

Punch into Colab

Tokenizing

Sets

Unique words

Making a Network

Edge Weights

Words-to-words

First things first

Second things second

Finding what’s next

Do this for all words

Let’s see it

Key-Value Storage

A dictionary

Colab has dictionaries

Adding keys

Seeing values

Back To Work

Use A dict

Easier to see

Takeaways

Early sketch

Meaning of weights

Calculate Weights

Let’s look at “be”

Generating Text

Rolling Die in Colab

In Colab

Try it out

Generating text

Clean it up

Some examples

More examples

Bonuses

If Time

Summary

What we learned

Fin

Use A `dict`