Mergesort

Calvin (Deutschbein)

W15Mon: 02 Dec

Announcements

  • Follow up from binary search (Friday, Nov 22)
  • Adventure Ongoing
    • You should be DONE
  • Advising ongoing

Today

  • Do things fast
    • Not adventure relevant
    • Is write-good-code relevant.
    • Sortin'

Throwback Wordle

  • Read in some text (hey just like adventure)
  • See if it's in a list of words (just like adventure)
  • Hella words (this is where it's different)
  • Do thing accordingly.
    • Wordle's ENGLISH_WORDS had 127145 words
    • It's here.
    • That would take a long time to read.

Throwback Wordle

  • We found words fast using binary search
    • Check the middle of the word list
    • See which side a word would be on
    • Check that side
    • Repeat
  • This was 100x speed but also...
  • Assumed the word list was sorted...

Sorting

  • Sorting can be fast, or it can be slow.
    • Imagine I am sorting 20 metric tons of peanuts by height
  • I could sort my peanuts by height by...
    • Taking one peanut at a time.
    • Going through a height-ordered list of peanuts
    • Comparing my peanut with each item in the list
    • When I get to two peanuts that are:
      • Next two each other when sorted by height
      • Would have my current peanut between them
    • I place my peanut at that location
    • Then loop by grabbing another peanut and starting at the beginning of the list.
  • This is known as "having a good time"

Technical vs Algorithmic Correctness

  • Sorting peanuts, one peanut at a time, by comparing to all other peanuts, is technically correct.
  • It is slow - imagine comparing each of 1000 peanuts to every other of the 1000 peanuts.
  • That is one million comparisons! My eyes would get tired!
  • We need to "divide-and-conquer".
  • It is more algorithmically correct - it wastes less effort.

One at a time

  • Suppose we have a peanut and an ordered collection of peanuts: def add_peanut(peanut, peanuts:list)->list: index = 0 # we'll look at peanuts in order while peanut < peanuts[index]: # see if our peanut's taller index += 1 # if it is, we keep going return peanuts[:index] + peanut + peanuts[:index]
    • This is technically correct.
    • Is there a way to divide it in half?

One at a time

  • One peanut at a time: def order_peanuts(peanuts:list)->list: ordered = [peanuts[0]] for peanut in peanuts[1:]: index = 0 while index < len(ordered) and peanut > ordered[index]: index += 1 ordered.insert(index, peanut) return ordered
    • This is technically correct.
    • Is there a way to divide it in half?

Divide and Conquer

  • Recall last class we split up peanuts into different grinders!
  • What if we sort parts of the list of peanuts!
    • We take a pile of peanuts.
    • We divide in half.
    • We sort both halves.
    • We combine the halves.
  • If we keep talking halves, eventually we'll have (many) piles of just one peanut!

Divide and Conquer

  • We take a pile of peanuts
def order_peanuts(peanuts:list)->list:

Divide and Conquer

  • We take a pile of peanuts
  • We divide in half.
def order_peanuts(peanuts:list)->list: half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:]

Divide and Conquer

  • We take a pile of peanuts
  • We divide in half.
  • We sort both halves.
def order_peanuts(peanuts:list)->list: half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:] head = order_peanuts(head) tail = order_peanuts(tail)

Divide and Conquer

  • We take a pile of peanuts
  • We divide in half.
  • We sort both halves.
  • We combine the halves.
def order_peanuts(peanuts:list)->list: half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:] head = order_peanuts(head) tail = order_peanuts(tail) return merge(head,tail)

Divide and Conquer

  • We take a pile of peanuts
  • We divide in half.
  • We sort both halves.
  • We combine the halves.
  • If we keep talking halves, eventually we'll have (many) piles of just one peanut!
def order_peanuts(peanuts:list)->list: if len(peanuts) < 2: return peanuts half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:] head = order_peanuts(head) tail = order_peanuts(tail) return merge(head,tail)

We need to write "merge"

Divide and Conquer

We can merge like so: def merge_peanuts(head, tail): merged = [] while head and tail: if head[0] < tail[0]: merged += [head[0]] head = head[1:] else: merged += [tail[0]] tail = tail[1:] merged += head + tail return merged

We can sort like so: def order_peanuts(peanuts): if len(peanuts) < 2: return peanuts half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:] head = order_peanuts(head) tail = order_peanuts(tail) return merge(head,tail) merge = merge_peanuts

Divide and Conquer

  • Which is faster?
    • The expensive thing, we found, was comparing peanuts
    • Let's add a counter, and count how many times we have to compare.compares = [0,0] def merge_peanuts(head, tail): merged = [] while head and tail: compares[0] += 1 # NEW !!! if head[0] < tail[0]:
    • And in the one-at-a-time.... index = 0 while index < len(ordered) and peanut > ordered[index]: compares[1] += 1 # NEW !!! index += 1

Divide and Conquer

  • We can test it on 10000 peanuts. from random import randint peanuts = [randint(0,1000000) for _ in range(10000)] compares = [0,0] order_peanuts_merge(peanuts) order_peanuts_oneat(peanuts) print(compares)
  • This code takes a long time to run...
  • ...because it's algorithmically incorrect! [120410, 25412110]
  • Going one-at-a-time is 211x times more work!

Try it

  • Full code: def merge_peanuts(head, tail): merged = [] while head and tail: compares[0] += 1 if head[0] < tail[0]: merged += [head[0]] head = head[1:] else: merged += [tail[0]] tail = tail[1:] merged += head + tail return merged def order_peanuts(peanuts): if len(peanuts) < 2: return peanuts half = len(peanuts) // 2 head = peanuts[:half] tail = peanuts[half:] head = order_peanuts(head) tail = order_peanuts(tail) return merge(head,tail) merge = merge_peanuts order_peanuts_merge = order_peanuts def order_peanuts_oneat(peanuts:list)->list: ordered = [peanuts[0]] for peanut in peanuts[1:]: index = 0 while index < len(ordered) and peanut > ordered[index]: compares[1] += 1 index += 1 ordered.insert(index, peanut) return ordered from random import randint peanuts = [randint(0,1000000) for _ in range(10000)] compares = [0,0] order_peanuts_merge(peanuts) order_peanuts_oneat(peanuts) print(compares)

Today

  • Do things fast
    • Not adventure relevant
    • Is write-good-code relevant.
    • Sortin'

Announcements

  • Follow up from binary search (Friday, Nov 22)
  • Adventure Ongoing
    • You should be DONE
  • Advising ongoing