Mergesort

Calvin (Deutschbein)

W15Mon: 02 Dec

Announcements

Follow up from binary search (Friday, Nov 22)
Adventure Ongoing
- You should be DONE
Advising ongoing

Today

Do things fast
- Not adventure relevant
- Is write-good-code relevant.
- Sortin'

Throwback Wordle

Read in some text (hey just like adventure)
See if it's in a list of words (just like adventure)
Hella words (this is where it's different)
Do thing accordingly.
- Wordle's ENGLISH_WORDS had 127145 words
- It's here.
- That would take a long time to read.

Throwback Wordle

We found words fast using binary search
- Check the middle of the word list
- See which side a word would be on
- Check that side
- Repeat
This was 100x speed but also...
Assumed the word list was sorted...

Sorting

Sorting can be fast, or it can be slow.
- Imagine I am sorting 20 metric tons of peanuts by height
I could sort my peanuts by height by...
- Taking one peanut at a time.
- Going through a height-ordered list of peanuts
- Comparing my peanut with each item in the list
- When I get to two peanuts that are:
  - Next two each other when sorted by height
  - Would have my current peanut between them
- I place my peanut at that location
- Then loop by grabbing another peanut and starting at the beginning of the list.
This is known as "having a good time"

Technical vs Algorithmic Correctness

Sorting peanuts, one peanut at a time, by comparing to all other peanuts, is technically correct.
It is slow - imagine comparing each of 1000 peanuts to every other of the 1000 peanuts.
That is one million comparisons! My eyes would get tired!
We need to "divide-and-conquer".
It is more algorithmically correct - it wastes less effort.

One at a time

Suppose we have a peanut and an ordered collection of peanuts: def add_peanut(peanut, peanuts:list)->list: index = 0 # we'll look at peanuts in order while peanut < peanuts[index]: # see if our peanut's taller index += 1 # if it is, we keep going return peanuts[:index] + peanut + peanuts[:index]
- This is technically correct.
- Is there a way to divide it in half?

One at a time

One peanut at a time: def order_peanuts(peanuts:list)->list: ordered = [peanuts[0]] for peanut in peanuts[1:]: index = 0 while index < len(ordered) and peanut > ordered[index]: index += 1 ordered.insert(index, peanut) return ordered
- This is technically correct.
- Is there a way to divide it in half?

Divide and Conquer

Recall last class we split up peanuts into different grinders!
What if we sort parts of the list of peanuts!
- We take a pile of peanuts.
- We divide in half.
- We sort both halves.
- We combine the halves.
If we keep talking halves, eventually we'll have (many) piles of just one peanut!

Divide and Conquer

We take a pile of peanuts

def order_peanuts(peanuts:list)->list:

Divide and Conquer

We take a pile of peanuts
We divide in half.

def order_peanuts(peanuts:list)->list:
    half = len(peanuts) // 2
    head = peanuts[:half]
    tail = peanuts[half:]

Divide and Conquer

We take a pile of peanuts
We divide in half.
We sort both halves.

def order_peanuts(peanuts:list)->list:
    half = len(peanuts) // 2
    head = peanuts[:half]
    tail = peanuts[half:]
    head = order_peanuts(head)
    tail = order_peanuts(tail)

Divide and Conquer

We take a pile of peanuts
We divide in half.
We sort both halves.
We combine the halves.

def order_peanuts(peanuts:list)->list:
    half = len(peanuts) // 2
    head = peanuts[:half]
    tail = peanuts[half:]
    head = order_peanuts(head)
    tail = order_peanuts(tail)
    return merge(head,tail)

Divide and Conquer

We take a pile of peanuts
We divide in half.
We sort both halves.
We combine the halves.
If we keep talking halves, eventually we'll have (many) piles of just one peanut!

def order_peanuts(peanuts:list)->list:
    if len(peanuts) < 2:
        return peanuts
    half = len(peanuts) // 2
    head = peanuts[:half]
    tail = peanuts[half:]
    head = order_peanuts(head)
    tail = order_peanuts(tail)
    return merge(head,tail)

We need to write "merge"

Divide and Conquer

We can merge like so: def merge_peanuts(head, tail): merged = [] while head and tail: if head[0] < tail[0]: merged += [head[0]] head = head[1:] else: merged += [tail[0]] tail = tail[1:] merged += head + tail return merged

We can sort like so: def order_peanuts(peanuts): if len(peanuts) < 2: return peanuts half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:] head = order_peanuts(head) tail = order_peanuts(tail) return merge(head,tail) merge = merge_peanuts

Divide and Conquer

Which is faster?
- The expensive thing, we found, was comparing peanuts
- Let's add a counter, and count how many times we have to compare.compares = [0,0] def merge_peanuts(head, tail): merged = [] while head and tail: compares[0] += 1 # NEW !!! if head[0] < tail[0]:
- And in the one-at-a-time.... index = 0 while index < len(ordered) and peanut > ordered[index]: compares[1] += 1 # NEW !!! index += 1

Divide and Conquer

We can test it on 10000 peanuts. from random import randint peanuts = [randint(0,1000000) for _ in range(10000)] compares = [0,0] order_peanuts_merge(peanuts) order_peanuts_oneat(peanuts) print(compares)
This code takes a long time to run...
...because it's algorithmically incorrect! [120410, 25412110]
Going one-at-a-time is 211x times more work!

Try it

Full code: def merge_peanuts(head, tail): merged = [] while head and tail: compares[0] += 1 if head[0] < tail[0]: merged += [head[0]] head = head[1:] else: merged += [tail[0]] tail = tail[1:] merged += head + tail return merged def order_peanuts(peanuts): if len(peanuts) < 2: return peanuts half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:] head = order_peanuts(head) tail = order_peanuts(tail) return merge(head,tail) merge = merge_peanuts order_peanuts_merge = order_peanuts def order_peanuts_oneat(peanuts:list)->list: ordered = [peanuts[0]] for peanut in peanuts[1:]: index = 0 while index < len(ordered) and peanut > ordered[index]: compares[1] += 1 index += 1 ordered.insert(index, peanut) return ordered from random import randint peanuts = [randint(0,1000000) for _ in range(10000)] compares = [0,0] order_peanuts_merge(peanuts) order_peanuts_oneat(peanuts) print(compares)

Today

Do things fast
- Not adventure relevant
- Is write-good-code relevant.
- Sortin'

Announcements

Follow up from binary search (Friday, Nov 22)
Adventure Ongoing
- You should be DONE
Advising ongoing