"What do you do if you code breaks?"¶

This.

In [1]:
factbook = """ 	American Ethnic Studies 	  	  	  	  	  	  	  	  	  	  	  	  	  	  	6
  	Anthropology 	  	  	  	  	  	  	  	2 	  	  	2 	  	2
  	Applied Mathematics 	  	  	  	  	  	  	  	  	  	  	  	  	  	  	2
  	Archaeology 	  	  	2 	4 	  	3 	  	3 	  	2 	10 	  	12 	  	3
  	Art 	  	  	1 	10 	1 	5 	  	6 	  	2 	21 	  	23 	  	27
  	Art History 	  	  	  	2 	1 	2 	1 	1 	  	2 	5 	  	7 	  	16
  	Biology 	1 	2 	6 	12 	11 	12 	6 	9 	  	24 	35 	  	59 	  	28
  	Business Adminstration 	  	1 	4 	3 	8 	6 	6 	3 	  	18 	13 	  	31 	  	18
  	Chemistry 	  	  	5 	8 	3 	12 	9 	5 	  	17 	25 	  	42 	  	28
  	Civic Communication and Media 	  	  	  	7 	4 	9 	4 	4 	  	8 	20 	  	28 	  	6
  	Classical Studies 	  	  	  	2 	  	  	1 	1 	  	1 	3 	  	4 	  	11
  	Computer Science 	  	1 	8 	1 	13 	5 	8 	5 	  	29 	12 	  	41 	  	5
  	Data Science 	  	  	11 	6 	10 	7 	3 	4 	  	24 	17 	  	41 	  	13
  	Economics 	  	  	7 	3 	15 	6 	11 	7 	  	33 	16 	  	49 	  	11
  	English 	  	1 	2 	17 	4 	10 	5 	9 	  	11 	37 	  	48 	  	10
  	Environmental Science 	2 	  	2 	14 	7 	13 	2 	17 	  	13 	44 	  	57 	  	14
  	Exercise Science 	1 	2 	3 	9 	8 	6 	9 	11 	  	21 	28 	  	49
  	Film Studies 	  	1 	1 	  	2 	1 	  	2 	  	3 	4 	  	7 	  	5
  	Global Cultural Studies 	  	  	1 	  	2 	9 	5 	8 	  	8 	17 	  	25 	  	43
  	History 	  	  	5 	4 	6 	6 	7 	5 	  	18 	15 	  	33 	  	10
  	Humanities 	  	  	  	  	1 	1 	  	  	  	1 	1 	  	2
  	International Studies 	  	  	3 	2 	1 	4 	1 	3 	  	5 	9 	  	14
  	Japanese 	  	  	  	  	  	  	  	  	  	  	  	  	  	  	9
  	Japanese Studies 	  	  	4 	5 	2 	5 	3 	2 	  	9 	12 	  	21 	  	2
  	Mathematics 	1 	  	2 	3 	6 	  	5 	4 	  	14 	7 	  	21 	  	11
  	Music 	  	  	1 	3 	  	5 	3 	1 	  	4 	9 	  	13 	  	12
  	Philosophy 	  	  	4 	3 	4 	4 	2 	2 	  	10 	9 	  	19 	  	8
  	Physics 	  	  	3 	  	8 	2 	8 	2 	  	19 	4 	  	23
  	Politics 	  	  	4 	12 	4 	13 	12 	15 	  	20 	40 	  	60 	  	7
  	Psychology 	1 	3 	6 	25 	4 	15 	5 	15 	  	16 	58 	  	74 	  	21
  	Public Health 	  	  	  	8 	3 	12 	2 	8 	  	5 	28 	  	33 	  	13
  	Religious Studies 	  	  	  	  	  	  	1 	1 	  	1 	1 	  	2 	  	1
  	Sociology 	  	  	2 	6 	  	9 	1 	7 	  	3 	22 	  	25 	  	11
  	Spanish 	  	  	  	1 	3 	2 	3 	5 	  	6 	8 	  	14 	  	23
  	Special 	  	  	  	  	  	  	  	1 	  	  	1 	  	1
  	Statistics 	  	  	  	  	  	  	  	  	  	  	  	  	  	  	1
  	Sustianability and Society 	  	  	  	  	  	  	  	  	  	  	  	  	  	  	6
  	Symbol Criticism/Rhet Analysis
  	Theatre 	  	  	  	6 	  	3 	2 	6 	  	2 	15 	  	17 	  	5
  	Women's and Gender Studies 	  	  	  	  	  	  	  	4 	  	  	4 	  	4 	  	12 	  	"""

Let's make a list of majors with at least one student and the inmajor Woman%. While doing this, we will definitely not think at all about gender lest the abyss begin to stare back at us...

In [2]:
majors = [[major[0], major[4], int(major[2]/major[4] * 100)] for major in [[line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]] for line in factbook.split('\n')] if major[4] != 0] # if i ever write one of these in two lines instead of one i get fired
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-671e1b1d1cd9> in <cell line: 1>()
----> 1 majors = [[major[0], major[4], int(major[2]/major[4] * 100)] for major in [[line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]] for line in factbook.split('\n')] if major[4] != 0] # if i ever write one of these in two lines instead of one i get fired

<ipython-input-2-671e1b1d1cd9> in <listcomp>(.0)
----> 1 majors = [[major[0], major[4], int(major[2]/major[4] * 100)] for major in [[line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]] for line in factbook.split('\n')] if major[4] != 0] # if i ever write one of these in two lines instead of one i get fired

IndexError: list index out of range

This code broke in class on 9/23. I'm not sure why; it's not a big deal.

It's composed of inner and outer loop comprehensions, so copy the original and trim down to an inner comprehension. I aim to test about half of the code.

In [3]:
[line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-34c47c929e3a> in <cell line: 1>()
----> 1 [line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]]

NameError: name 'line' is not defined

This looks like it might work but I don't have a line to test on. I'll make one quickly.

In [4]:
line = factbook.split('\n')[0] # split on "enter" and take one element - a single line (the first one, in this case)
In [5]:
[line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]]
Out[5]:
['American Ethnic Studies', 0, 0, 0, 0]

Okay that works so the issue is in the outer comprehension. I'll take my processed line and use it in the outer comprehension.

In [6]:
proced = [line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]]
In [19]:
majors = [[major[0], major[4], int(major[2]/major[4] * 100)] for major in [proced] for line in factbook.split('\n') if major[4] != 0] # if i ever write one of these in two lines instead of one i get fired

Okay that seems to have worked. Let's try a few other lines. We'll make functions to simplify testing.

In [15]:
inner = lambda line : [line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]]
In [24]:
[inner(line) for line in factbook.split('\n')]
Out[24]:
[['American Ethnic Studies', 0, 0, 0, 0],
 ['Anthropology', 0, 2, 0, 2],
 ['Applied Mathematics', 0, 0, 0, 0],
 ['Archaeology', 2, 10, 0, 12],
 ['Art', 2, 21, 0, 23],
 ['Art History', 2, 5, 0, 7],
 ['Biology', 24, 35, 0, 59],
 ['Business Adminstration', 18, 13, 0, 31],
 ['Chemistry', 17, 25, 0, 42],
 ['Civic Communication and Media', 8, 20, 0, 28],
 ['Classical Studies', 1, 3, 0, 4],
 ['Computer Science', 29, 12, 0, 41],
 ['Data Science', 24, 17, 0, 41],
 ['Economics', 33, 16, 0, 49],
 ['English', 11, 37, 0, 48],
 ['Environmental Science', 13, 44, 0, 57],
 ['Exercise Science', 21, 28, 0, 49],
 ['Film Studies', 3, 4, 0, 7],
 ['Global Cultural Studies', 8, 17, 0, 25],
 ['History', 18, 15, 0, 33],
 ['Humanities', 1, 1, 0, 2],
 ['International Studies', 5, 9, 0, 14],
 ['Japanese', 0, 0, 0, 0],
 ['Japanese Studies', 9, 12, 0, 21],
 ['Mathematics', 14, 7, 0, 21],
 ['Music', 4, 9, 0, 13],
 ['Philosophy', 10, 9, 0, 19],
 ['Physics', 19, 4, 0, 23],
 ['Politics', 20, 40, 0, 60],
 ['Psychology', 16, 58, 0, 74],
 ['Public Health', 5, 28, 0, 33],
 ['Religious Studies', 1, 1, 0, 2],
 ['Sociology', 3, 22, 0, 25],
 ['Spanish', 6, 8, 0, 14],
 ['Special', 0, 1, 0, 1],
 ['Statistics', 0, 0, 0, 0],
 ['Sustianability and Society', 0, 0, 0, 0],
 ['Symbol Criticism/Rhet Analysis'],
 ['Theatre', 2, 15, 0, 17],
 ["Women's and Gender Studies", 0, 4, 0, 4]]
In [23]:
majors = [inner(line) for line in factbook.split('\n')]
In [25]:
outer = lambda majors : [[major[0], major[4], int(major[2]/major[4] * 100)] for major in majors if major[4] != 0]
In [29]:
outer(majors[0:2])
Out[29]:
[['Anthropology', 2, 100]]

Well it's working on a subset. Let's test on the whole thing.

In [30]:
outer(majors)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-30-dc1890a5b2b6> in <cell line: 1>()
----> 1 outer(majors)

<ipython-input-25-75ac97c08368> in <lambda>(majors)
----> 1 outer = lambda majors : [[major[0], major[4], int(major[2]/major[4] * 100)] for major in majors if major[4] != 0]

<ipython-input-25-75ac97c08368> in <listcomp>(.0)
----> 1 outer = lambda majors : [[major[0], major[4], int(major[2]/major[4] * 100)] for major in majors if major[4] != 0]

IndexError: list index out of range

Aha - the same error. We can figure out which line now though.

In [31]:
outer(majors[:len(majors)//2])
Out[31]:
[['Anthropology', 2, 100],
 ['Archaeology', 12, 83],
 ['Art', 23, 91],
 ['Art History', 7, 71],
 ['Biology', 59, 59],
 ['Business Adminstration', 31, 41],
 ['Chemistry', 42, 59],
 ['Civic Communication and Media', 28, 71],
 ['Classical Studies', 4, 75],
 ['Computer Science', 41, 29],
 ['Data Science', 41, 41],
 ['Economics', 49, 32],
 ['English', 48, 77],
 ['Environmental Science', 57, 77],
 ['Exercise Science', 49, 57],
 ['Film Studies', 7, 57],
 ['Global Cultural Studies', 25, 68],
 ['History', 33, 45]]
In [32]:
outer(majors[len(majors)//2:])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-32-04fd6932ddc5> in <cell line: 1>()
----> 1 outer(majors[len(majors)//2:])

<ipython-input-25-75ac97c08368> in <lambda>(majors)
----> 1 outer = lambda majors : [[major[0], major[4], int(major[2]/major[4] * 100)] for major in majors if major[4] != 0]

<ipython-input-25-75ac97c08368> in <listcomp>(.0)
----> 1 outer = lambda majors : [[major[0], major[4], int(major[2]/major[4] * 100)] for major in majors if major[4] != 0]

IndexError: list index out of range

It's in the second half.

In [34]:
majors[len(majors)//2:]
Out[34]:
[['Humanities', 1, 1, 0, 2],
 ['International Studies', 5, 9, 0, 14],
 ['Japanese', 0, 0, 0, 0],
 ['Japanese Studies', 9, 12, 0, 21],
 ['Mathematics', 14, 7, 0, 21],
 ['Music', 4, 9, 0, 13],
 ['Philosophy', 10, 9, 0, 19],
 ['Physics', 19, 4, 0, 23],
 ['Politics', 20, 40, 0, 60],
 ['Psychology', 16, 58, 0, 74],
 ['Public Health', 5, 28, 0, 33],
 ['Religious Studies', 1, 1, 0, 2],
 ['Sociology', 3, 22, 0, 25],
 ['Spanish', 6, 8, 0, 14],
 ['Special', 0, 1, 0, 1],
 ['Statistics', 0, 0, 0, 0],
 ['Sustianability and Society', 0, 0, 0, 0],
 ['Symbol Criticism/Rhet Analysis'],
 ['Theatre', 2, 15, 0, 17],
 ["Women's and Gender Studies", 0, 4, 0, 4]]

Now I see it. Do you? It's right there at the end.

In [35]:
majors[-3:]
Out[35]:
[['Symbol Criticism/Rhet Analysis'],
 ['Theatre', 2, 15, 0, 17],
 ["Women's and Gender Studies", 0, 4, 0, 4]]

One of the majors has no integers! I don't know. But it doesn't matter. We simply update "outer" to ignore anything of the incorrect length.

In [63]:
# was outer = lambda majors : [[major[0], major[4], int(major[2]/major[4] * 100)] for major in majors if major[4] != 0]
# we make sure that there are three things in line - not just one!
outer = lambda majors : [[major[0], major[4], int(major[2]/major[4] * 100)] for major in majors if len(major) == 5 and major[4] != 0]
In [66]:
outer(majors)
Out[66]:
[['Anthropology', 2, 100],
 ['Archaeology', 12, 83],
 ['Art', 23, 91],
 ['Art History', 7, 71],
 ['Biology', 59, 59],
 ['Business Adminstration', 31, 41],
 ['Chemistry', 42, 59],
 ['Civic Communication and Media', 28, 71],
 ['Classical Studies', 4, 75],
 ['Computer Science', 41, 29],
 ['Data Science', 41, 41],
 ['Economics', 49, 32],
 ['English', 48, 77],
 ['Environmental Science', 57, 77],
 ['Exercise Science', 49, 57],
 ['Film Studies', 7, 57],
 ['Global Cultural Studies', 25, 68],
 ['History', 33, 45],
 ['Humanities', 2, 50],
 ['International Studies', 14, 64],
 ['Japanese Studies', 21, 57],
 ['Mathematics', 21, 33],
 ['Music', 13, 69],
 ['Philosophy', 19, 47],
 ['Physics', 23, 17],
 ['Politics', 60, 66],
 ['Psychology', 74, 78],
 ['Public Health', 33, 84],
 ['Religious Studies', 2, 50],
 ['Sociology', 25, 88],
 ['Spanish', 14, 57],
 ['Special', 1, 100],
 ['Theatre', 17, 88],
 ["Women's and Gender Studies", 4, 100]]

Optionally recompose the one-line.

In [67]:
majors = [[major[0], major[4], int(major[2]/major[4] * 100)] for major in [[line.split('\t')[1].strip()] +[int(val.strip()) if val.strip() else 0 for val in line.split('\t')[11:15]] for line in factbook.split('\n')] if len(major) == 5 and major[4] != 0] # if i ever write one of these in two lines instead of one i get fired
In [68]:
majors
Out[68]:
[['Anthropology', 2, 100],
 ['Archaeology', 12, 83],
 ['Art', 23, 91],
 ['Art History', 7, 71],
 ['Biology', 59, 59],
 ['Business Adminstration', 31, 41],
 ['Chemistry', 42, 59],
 ['Civic Communication and Media', 28, 71],
 ['Classical Studies', 4, 75],
 ['Computer Science', 41, 29],
 ['Data Science', 41, 41],
 ['Economics', 49, 32],
 ['English', 48, 77],
 ['Environmental Science', 57, 77],
 ['Exercise Science', 49, 57],
 ['Film Studies', 7, 57],
 ['Global Cultural Studies', 25, 68],
 ['History', 33, 45],
 ['Humanities', 2, 50],
 ['International Studies', 14, 64],
 ['Japanese Studies', 21, 57],
 ['Mathematics', 21, 33],
 ['Music', 13, 69],
 ['Philosophy', 19, 47],
 ['Physics', 23, 17],
 ['Politics', 60, 66],
 ['Psychology', 74, 78],
 ['Public Health', 33, 84],
 ['Religious Studies', 2, 50],
 ['Sociology', 25, 88],
 ['Spanish', 14, 57],
 ['Special', 1, 100],
 ['Theatre', 17, 88],
 ["Women's and Gender Studies", 4, 100]]
In [69]:
import pandas as pd
pd.DataFrame(majors)
Out[69]:
0 1 2
0 Anthropology 2 100
1 Archaeology 12 83
2 Art 23 91
3 Art History 7 71
4 Biology 59 59
5 Business Adminstration 31 41
6 Chemistry 42 59
7 Civic Communication and Media 28 71
8 Classical Studies 4 75
9 Computer Science 41 29
10 Data Science 41 41
11 Economics 49 32
12 English 48 77
13 Environmental Science 57 77
14 Exercise Science 49 57
15 Film Studies 7 57
16 Global Cultural Studies 25 68
17 History 33 45
18 Humanities 2 50
19 International Studies 14 64
20 Japanese Studies 21 57
21 Mathematics 21 33
22 Music 13 69
23 Philosophy 19 47
24 Physics 23 17
25 Politics 60 66
26 Psychology 74 78
27 Public Health 33 84
28 Religious Studies 2 50
29 Sociology 25 88
30 Spanish 14 57
31 Special 1 100
32 Theatre 17 88
33 Women's and Gender Studies 4 100
In [78]:
import plotly.express as px
px.bar(pd.DataFrame(majors),x=0,y=1,color=2)