Formatting English

Calvin (Deutschbein)
Slides by Jed Rembold

28 February 2022

Announcements

  • Problem Set 4 is posted!
  • You should have a midterm grade and all problem sets graded, projects are forthcoming.

Review!

Suppose you have the string x = "consternation" and you’d like to just extract and print the word "nation". Which expression below will not give you the string "nation"?

  1. x[7:len(x)]
  2. x[7:]
  3. x[-6:len(x)]
  4. x[-6:-1]

Igpay Atinlay

  • Reminder: we were trying to construct a program to convert English to Pig Latin
  • Rules of Pig Latin:
    • If the word begins with a consonant, move everything up to the first vowel to the end and append on “ay” at the end
      fleeteetflay
    • If the word starts with a vowel, just append “way” to the end
      orangeorangeway
    • If the word has no vowels, do nothing
  • Our decomposition:
    • Convert a single word ✓
      • Find first vowel ✓
    • Loop over whole phrase to convert the entire phrase word by word

oNvertcay hrasepay

def string_2_pig_latin(string):
    start = 0
    i = 0
    phrase = ""
    finding_words = True
    while len(string) > 0 and finding_words:
        if not string[i].isalpha():
            word = string[start:i]
            phrase += word_2_pig_latin(word) + string[i]
            start = i + 1
        i += 1
        if i == len(string):
            finding_words = False
            phrase += word_2_pig_latin(string[start:])
    return phrase

The english.py Library

  • To facilitate working with English words, we can take advantage of the pre-written english module
    • This will be highly useful in the Project 2
  • The english module exports two resources:
    • ENGLISH_WORDS: a constant which contains all the valid English words in alphabetical order
    • is_english_word(): a predicate function which takes a string as input and returns True or False depending on if that string is in the list of valid English words

Biggest No-vowel Word

  • Suppose we wanted to determine the longest word in the English language without vowels:
from english import ENGLISH_WORDS

def find_first_vowel(word):
    for i in range(len(word)):
        if word[i].lower() in "aeiou":
            return i
    return -1

def find_longest_no_vowels():
    best_length = 0
    for word in ENGLISH_WORDS:
        vowel_loc = find_first_vowel(word)
        if vowel_loc == -1 and len(word) > best_length:
            best_length = len(word)
            print(word)


if __name__ == '__main__':
    find_longest_no_vowels()

When Pig Latin = English?

  • What about when the Pig Latin version of a word is a (different) but valid English word?
  • Lets not count words with no vowels, since clearly they would qualify
def platin_equals_english():
    count = 0
    for word in ENGLISH_WORDS:
        platin = word_2_pig_latin(word)
        if is_english_word(platin) and word != platin:
            print(word, ":", platin)
            count += 1
    return count

String Formatting

  • Constructing text or a sentence by interleaving strings and other objects comes up a lot in communicating code results to a user
A = 10
B = print("The value of A is: " + str(A) +"!")
print(B)
  • Having to convert everything manually to strings (when necessary) and concatenate it all together is clunky and easy to mess up
  • Python gives us two main methods to assist with this and any other formatting desires we might have:
    • The .format() method in Python’s string class
    • A new language feature called f-strings, introduced in Python 3.6

Placeholders

  • The format method and f-strings rely on the idea of a placeholder that is inserted into the string, and that marks where later data should be inserted
    • We use curly brackets: {} to indicate the placeholder
    • The placeholder shows up in the original string that the format method is acting on, not as an argument to format
  • Objects which are provided as arguments to format are then converted to a string if needed and swapped in instead of the placeholder.
  • Simple Examples
    • "The value of A is {}".format(10)
    • "{} + {} is {}".format(2, 5, 2+5)

Labels can be key

  • By default, all arguments to .format() are positional
    • Order of parameters is matched with the order of {}
  • You can also provide an order index inside the placeholder ({ })
    • "{1} + {0} is {2}".format(2, 5, 2+5)
  • You can also provide keyword labels in the placeholder, but then you must call with a keyword
    • "{name} is {age} years old.".format(age=34, name="Amy")
    • "{} + {B} + {C} = {}".format(3, 10, C=5, B=2)

The format of format

  • Often times when displaying a value in a string, we may want control over certain format specifications:
    • The number of decimals shown
    • Padding a number of zeros or spaces
    • Justifying a number within a region of characters
    • Use scientific notation
    • etc
  • We can do all this (and more) by including a colon and then a special format specification inside the placeholder!
    • "The value of A is {0:.5f}".format(10)

Shaping your format

  • Lots we can do, but we rarely need to do it all at once
  • The basic building blocks:
    [[fill]align][sign][width][,][.precision][type]


  • type
    • What type of object you want to represent as a string
    • “d” - base-10 integer
    • “f” - float or decimal
    • “e” - scientific notation
    • More on next slide
  • precision
    • How many digits to display after a decimal point
    • Details can vary a little by conversion type
  • Grouping
    • A comma here indicates that numbers should be grouped in sets of 3 and separated with a comma
  • width
    • The minimum number of characters you want the formatted value to be
  • sign
    • If the sign of the number should be specified
    • A + sign ensures all numbers will have either a + or - sign
    • A space just adds a space before positive numbers (negatives would have a - sign in front)
  • [fill]align
    • How you want the text aligned if it is smaller than the minium
      • Can be <, >, or ^ for left, right, or center justified
    • Any characters you want to fill the empty space
      • Default is space

Output Conversion Types

  • Below is a table of most common conversion types:
Code Description
b Inserts an integer using its binary representation
d Inserts an integer using its decimal representation
e or E Inserts a number using scientific notation
f or F Inserts a number using a decimal point format
g or G Choose e or f to get the best fit
o Inserts an integer using its octal representation
s Inserts a string value
x or X Inserts an integer using its hexadecimal representation
  • Uppercase conversion types use capital characters where possible in output

Not to be confused with a d-string

  • Short for “format string”, an f-string achieves the same thing as .format() but with less syntax
  • Moves the value to be substituted directly into the placehold itself
  • Need an f right before the string to let Python know it needs to do more with this particular string
A = 10
B = 15.123234
print(f"A is {A} and B is {B:0.2f}")
  • You can otherwise use the same syntax and format specs as with format!
// reveal.js plugins