Time

Calvin (Deutschbein)

W14Mon: 24 Nov

Announcements

Adventure Ongoing
- You should be making API calls
Advising ongoing
- If you encounter any problem, email me immediately
- I'll be doing triage:
  - If you don't get an email back quickly, either
    - I'm on a multi-hour trail run and/or asleep, or
    - I will be able to solve any problems you encounter non-urgently.
  - Either way, once you send the email it is not your problem.

Today

Computer science as an experimental science.
Is write-good-code relevant.

Refresh Binary Search

Altogether def check_word(my_word, word_list): while len(word_list) > 1: half_length = len(word_list) // 2 if my_word < word_list[half_length]: # keep only the first half word_list = word_list[:half_length] else: # keep only the second half word_list = word_list[half_length:] return my_word == word_list[0]

Refresh Binary Search

This is algorithmically efficient! def check_word(my_word, word_list): while len(word_list) > 1: half_length = len(word_list) // 2 if my_word < word_list[half_length]: word_list = word_list[:half_length] else: word_list = word_list[half_length:] return my_word == word_list[0]
It is a special new type of inefficient: technically inefficient.

Refresh Binary Search

Examine these lines: ... word_list = word_list[:half_length] ... word_list = word_list[half_length:]
Let's remind ourselves of how strings work within Python.

The word list is big!

We can request it. import requests, json URL = "https://gist.githubusercontent.com/cd-public/0a09043d500a9bc3397ebcfeb5f7a4f5/raw/41d37a509cc19e0609a1637370096f30ff4a1ea3/english.json" r = requests.get(URL) ENGLISH_WORDS = r.json()

The word list is big!

How big? import sys sys.getsizeof(ENGLISH_WORDS)
1140568 - one million letters or one million bytes (a megabyte).

The word list is big!

Then how much memory do we need to do this... word_list = word_list[:half_length]
At least the first time, half a megabyte.
It probably doesn't feel like a lot (it isn't) but your computer probably noticeably slows a bit when loading the list of words.

Don't copy!

But, you see, we don't need a new list.
We can just keep track of the closest point to the beginning and closest point to the end that the word can be at! head = 0 tail = len(word_list) - 1 while tail - head > 1: midl = head + (tail - head) // 2 if my_word < word_list[midl]: # keep only the first half tail = midl else: # keep only the second half head = midl
Rather than making copies of the big list of words, we just refine which part of the big list we are looking at.

Let's watch!

But, you see, we don't need a new list.
We can just keep track of the closest point to the beginning and closest point to the end that the word can be at! def check_word(my_word): head, tail = 0, len(ENGLISH_WORDS) - 1 while tail - head > 1: midl = head + (tail - head) // 2 if my_word < ENGLISH_WORDS[midl]: tail = midl else: head = midl return my_word == ENGLISH_WORDS[head]
Rather than making copies of the big list of words, we just hone in a smaller and smaller part of the list.

Let's check it

Let's check:
1. Using Python "in"
2. Writing a "for" loop checking every word.
3. The end of last class version, with copies.
4. This version, using an index.
We'll introduce another new library - time!

Time

We can see what time it currently is, but in a confusing way.
Return the time in seconds since the epoch as a floating-point number.
A yes, the epoch, I know what that is...
The epoch is the point where the time starts, the return value of `time.gmtime(0)`. It is January 1, 1970, 00:00:00 (UTC) on all platforms.
At time of slide creation: python -c "import time; print(time.time())" 1763869688.1104693
Mostly useful to keep track of changes in time.

Time

We can measure how long it takes from someone to input something. from time import time start = time() print("After you press enter, the amount of time since this text appeared will be printed.") input() end = time() print(end-start)

Timing

We can:
- Import time.
- Create a list of words, say every hundredth word.
- Create a list of the four functions.
- Loop over each function:
  - Start the clock.
  - Check each word and it's reversal.
  - Stop the clock and then print the result.

Sample Code

from time import time

test_set = ENGLISH_WORDS[50::100]
funcs = [by_in, by_for, by_copy, by_index]

for f in funcs:
    start = time()
    for word in test_set:
        f(word), f(word[::-1])
    print(f, time()-start)

Link to .py

Results

These are the results I got... <function by_in at 0x7aa67a763d90> 1.5537760257720947 <function by_for at 0x7aa679869c60> 4.493199586868286 <function by_copy at 0x7aa679915630> 1.0388541221618652 <function by_index at 0x7aa6799156c0> 0.0052835941314697266
Takeaway: If you know a list is sorted, you can be (much) faster than the built-in way of doing things.
You can do things both (1) algorithmically correctly and (2) technically efficiently.
Computer science is both a mathematical and natural science.

Today

Computer science as an experimental science.
Is write-good-code relevant.

Announcements

Adventure Ongoing
- You should be thinking about how to navigate between scenes
Advising ongoing
- If you encounter any problem, email me immediately
- I'll be doing triage:
  - If you don't get an email back quickly, either
    - I'm on a multi-hour trail run and/or asleep, or
    - I will be able to solve any problems you encounter non-urgently.
  - Either way, once you send the email it is not your problem.