Graphs

Week 0xC I

Prof. Calvin

Crypto

Announcements

  • Welcome to variously CS 276/CS 540
    • Graphs, to introduce
      • DAGs
      • Trees, to introduce,
      • Hash or Merkle trees
  • Action Items:
    • heap_t after this, but really after “Merkle”

Today

  • Graphs
    • DAGs
    • Trees

Trees

\(n\)-ary Trees

  • A tree is a type of graph.
  • A graph is a pair of
    • A set of “nodes” or “vertices” (or “vertexes”)
    • A set of “edges” which are pairs of nodes
      • They may be ordered pairs or not.

Graphs

  • There are many ways to understand graphs, but I actually think graph theory is quite accessible.
  • I think it’s also easier when using a running example.
  • I will also use a running example, Amtrak 🚄
    • Or perhaps… a training example?
    • I am training… to run?????
    • ???????????

Amtrak 🚄

Graph: Definition

  • A graph is an ordered pair.
    • That is: two things, in a fixed order.

    • An ordered pair can be thought of as a sequence of length 2

      a = "one thing"
      b = "another thing"
      ordered_pair = [a,b]

Not Sets

  • A graph is not a set:
>>> [a,b][0]
'one thing'
>>> {a,b}[0]
<stdin>:1: SyntaxWarning: 'set' object is not subscriptable; perhaps you missed a comma?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable

Aside: Pairs

  • Take a number plane (click):

y=2

x=-2 x=2

y=2

Aside: Pairs

  • Denote ordered pair (.5,1.5) in red.

y=2

x=-2 x=2

y=2

Pair: Definition

  • We construct pairs from sets.
    • An ordered pair is a set of cardinality two. (2 elements)
    • One element of the set is a set of cardinality one. (1 element)
    • The other element of the set is a set of cardinality two (2 elements)
    • The element of the set of cardinality one is one of the two elements of the set of cardinality two.

Pair: Definition

  • The element of both sets is regarded as the first element.
  • This doesn’t work in Python since Python sets can’t contain other sets:
>>> [a,b] # ordered pair
['one thing', 'another thing']
>>> {{a},{a,b}} # ordered pair with sets
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>>
  • But this is an implementation hurdle, and not a logical one.

Pair: Definition

Ordered Pair

An ordered pair is a set of two elements, a set containing the head element of the pair, and a set containing both elements of the pair.

(a, b) := p : { {a} , {a, b} }

“Tail”

  • The tail is the sole element of the union that is not an element of the intersection, with the caveat that we do not consider the case in which the head and tail differ
    • I denote this ‘Tail(p)’

\[ \begin{align*} \text{Tail}(p) &:= x : x \in \bigcup p \setminus \bigcap p \\ \text{Tail}(p) &:= x : \exists Y_1, Y_2 \in p : x \in Y_1 \land x \notin Y_2 \end{align*} \]

Graph: Definition

  • A graph is an ordered pair \(G\)
  • We denote the two elements of the pair as:
    • \(V\) for vertices
    • \(E\) for edges \[ G = (V, E) := \{ \{V\}, \{V,E\}\} \]

In Amtrak

  • \(V\), the vertices or nodes, stations or stops or cities,
    • like Portland and Seattle, or
    • like Kings St Station and Central Station
  • \(E\), the edges, are the connections to adjacent stations
    • like Portland and Oregon City, or Portland and Vancouver, WA.

Graph: Definition

  • An Amtrak is an ordered pair
  • We denote the two elements of the pair as:
    • Stations for train stations used by Amtrak passengers, and
    • Trains which Amtrak passengers ride between stations.

Amtrak = (Stations, Trains)

Nodes, Vertices, or \(V\)

  • Let us consider the “Amtrak Cascades”
  • The set of vertices is the set of stations:
ALBANY EVERETT PORTLAND TUKWILA EUGENE
BELLINGHAM KELSO/LONGVIEW SALEM VANCOUVER BC OREGON CITY
CENTRALIA MOUNT VERNON SEATTLE VANCOUVER WA TACOMA
EDMONDS OLYMPIA/LACEY STANWOOD

Nodes, Vertices, or \(V\)

  • Restrict our example to the PDX<->SEA 6x daily trains:
CENTRALIA OLYMPIA/LACEY SEATTLE TUKWILA
KELSO/LONGVIEW PORTLAND TACOMA VANCOUVER WA

Edges or \(E\)

  • \(E\) is a set of elements termed ‘edges’, ‘links’, or ‘lines’
    • The edges are pairs of vertices
    • In an undirected graph, edges are unordered pairs (sets of cardinality two)
    • In an directed graph, edges are ordered pairs (not subsets then)
    • Amtrak is undirected, trains are directed.

Directionality

  • The internet is directed.
    • We may use ‘curl’ to download files from a url, but going the other way (creating files at a url) is highly nontrivial.
  • Pointers are directed.
    • Recall, *p refers to the value at location p.
    • **p does not refer to to p, rather, we would use & to go in the reverse direction.

Route

  • Order north-to-south:

    SEATTLE
    TUKWILA
    TACOMA
    OLYMPIA/LACEY
    CENTRALIA
    KELSO/LONGVIEW
    VANCOUVER WA
    PORTLAND
  • Centralia and Keslo/Longview is an edge.

  • The others, like Seattle and Tacoma, are not edges.

Edges or \(E\)

  • Our edges are pairs of adjacent stations
  • There are 8 stations so there are 7 edges
(SEATTLE, TUKWILA) (CENTRALIA, KELSO)
(TUKWILA, TACOMA) (KELSO, VANCOUVER)
(TACOMA, OLYMPIA) (VANCOUVER, PORTLAND)
(OLYMPIA, CENTRALIA)

Notation

  • I use tuple notion here, with parens, but it would be equally proper to use set notatation for an undirected graph.
  • Also I maintained geographic ordering (as a convenience) but it would be more proper to have no particular order since E is a set.

Graph: Definition

  • We can express G using only sets over elements of stations (this is basically JSON):

    G = {
      V = {
        SEATTLE,
        TUKWILA,
        TACOMA,
        OLYMPIA/LACEY,
        CENTRALIA,
        KELSO/LONGVIEW,
        VANCOUVER WA
        PORTLAND,
      }, 
      {
        V,
        E = {
                  {
                      TUKWILA,
                      TACOMA
                  },
                  {
                      TACOMA,
                      OLYMPIA/LACEY
                  },
                  {
                      OLYMPIA/LACEY,
                      CENTRALIA
                  },
                  {
                      CENTRALIA,
                      KELSO/LONGVIEW
                  },
                  {
                      KELSO/LONGVIEW,
                      VANCOUVER WA
                  },
                  {
                      VANCOUVER WA,
                      PORTLAND
                  }
              }
        }
    }

Graph: Exercise

  • Amtrak recently added lines, including the Hiawatha, with service from Milwaukee to the world’s greatest city, Chicago.

    From the grandeur of Grant Park’s Buckingham Fountain to iconic museums and skyscrapers, see for yourself why Chicago was once dubbed “Paris on the Prairie.” Engage in retail therapy on the Magnificent Mile or root for the home team within the friendly confines of famed Wrigley Field.

Graph: Exercise

  • As an exercise, construct the graph of the Hiawatha route using the JSON-ish notation I used for Cascades.
  • The route is described here and contains five stations.
  • You may use the three letter codes like “MKE” and “CHI” as a notational convenience if you would like.
  • As a bonus: Write valid json. Here’s a tester.

Today

  • ✓ Graphs
    • DAGs
    • Trees

DAGs

  • A DAG is a “directed acyclic graph”.
  • We will define rigorously.

Tred-G

Cycles

  • E is a set of pairs of elements of V.
  • We can think of edges in a graph as a homogenous binary relation.
    • Homogenous: From a set (the set of vertices) to itself.
    • Binary: Over two elements (just like edges)
    • Relation: A set (of pairs)

Closures

  • “Transitive closures” are defined over homogenous binary relations, so we can take a “transitive closure” over edges.

Cartesian product

  • A relation \(R\) over a set \(S\) is a set of pairs of elements of \(S\)
    • Or: a subset of the set of all pairs of elements

Use \(\times\)

  • We denote all pairs using a “Cartesian product”:

  • \(S \times S\) in LaTeX

S \times S
  • S × S in HTML
<em>S</em> &times; <em>S</em>

Colors

  • For example, over the set of traffic light colors \(C\): \[ C = \{ G, Y, R \} \]
  • The Cartesian product is all pairs:

\[ C \times C = \{ (G, Y), (G, R), (Y, R), (Y, G),(R, G), (R, Y) \} \]

  • Order matters.

Relation

  • A relation R over a set S is a set of pairs of elements of S
    • Or: a subset of the set of all pairs of elements in is transitive
  • We begin with the Cartesian product:

\[ C \times C = \{ (G, Y), (G, R), (Y, R), (Y, G),(R, G), (R, Y) \} \]

  • Define traffic light \(L \subset C \times C\)

Properties

  • Construct \(L\), the traffic light relation
    • No red without prior yellow \[ (Y,X) \in L \implies X = R \]
    • All colors must appear in both positions \[ \forall X \in C, \exists Y, Z \in C : \{ (X,Y), (Z,X) \} \subset L \]

This gives

  • The relation then is \[ L = \{ (G, Y), (Y, R), (R, G)\} \]
  • We not this is a subset of \(C \times C\)

Notation

  • Variety of ways to express this.
    • Function application
      • L(G) = Y
    • Infix notation
      • gLy

Matrix

  • Starting state on top, ending state on side
\(G\) \(Y\) \(R\)
\(G\) x
\(Y\) x
\(R\) x
  • Astute blah blah its 9 bits

Rainbow

  • Vs. traffic light, we can take “wavelength” where colors are sorted by something physics something optics help I don’t know this stuff.

Double-alaskan-rainbow

Rainbow

  • We note red is higher than yellow which is higher than green.
    • Green is higher than nothing or \(\varnothing\)
\(G\) \(Y\) \(R\)
\(G\) x
\(Y\) x
\(R\)

Rainbow

  • We note that Rainbow \(B\) (\(R\) is taken) is also a binary relation.

  • We will explore its properties. \[ B \subset C \times C = \{ (R,Y), (Y,G) \} \]

  • We additionally note that while \(R\) is immediately above \(Y\), it is also above \(G\)

Transitivitity

  • Transitivity is over relation \(R\) as follows.

\[ ((a, b) \in R \land (b, c) \in R) \implies (a, c) \in R \]

  • For us, that would be, \[ ((R, Y) \in B \land (Y,G) \in B) \implies (R, G) \in B \]
  • In the rainbow case, this is true, \(R\) appears above \(G\) in the rainbow because \(R\) is above \(Y\) and \(Y\) is above \(G\).

Closure

  • Our initial relation was not transitive.
\(G\) \(Y\) \(R\)
\(G\) x
\(Y\) x
\(R\)
  • As in the Amtrak case, we considered only adjacent elements as a convenience, but that doesn’t tell us if e.g. we can get from Seattle to Eugene.

Closure

  • To do that, we need to to preserve transitivity - and we do so using the notion of transitive closure
  • A closure is an operation over sets, and in the case of binary relations, which are sets of pairs, is an operation over the relation itself, or the sets of pairs themselves (which are equivalent).

Closure

  • We denote the transitive closure of some relation \(R\) as \(R^+\)
  • First, the transitive closure R⁺ of some relation R must contain all elements of R.

\[ R^+ \supseteq R \]

  • We recall we can use set operations like superset on \(R\) as it is a set of pairs, in this case ordered.

R⁺R

Superset review

  • That means that for all pairs (of colors, for example) in the binary relation \(R\), each one of them is also in \(R^+\)

\[ \forall p \in R, p \in R^+ \]

\[ \forall x,y \in S, (x,y) \in R \subset S \times S \implies (x,y) \in R^+ \]

Closure

  • The transitive closure \(R^+\) of some relation \(R\) must be transitive.
  • Recall:

\[ ((a, b) \in R \land (b, c) \in R) \implies (a, c) \in R \]

Discussion

  • Note: since this is a “for all”, it means as we are required to add new elements to maintain transitivity, transitivity must apply to those new elements.
  • So, if we add SeattleAmtrakTacoma because we have SeattleAmtrakTukwila and TukwilaAmtrakTacoma
  • We will then have to add SeattleAmtrakOlympia because Amtrak also contains TukwilaAmtrakOlympia

Color Example

  • If in rainbow relation \(B\) we find \(R\) is higher in the rainbow than \(Y\) and \(Y\) is higher in the rainbow than \(G\) then \[ \{ (R,Y), (Y,G) \} \subseteq B \]
  • The transitive closure adds at least one additional element. \[ \{ (R,Y), (Y,G), (R,G) \} \subseteq B^+ \]

Closure

  • R⁺ must be the smallest possible relation for that is a transitive superset of R.
    • Cartesian Product is always transitive and a superset, but not always the smallest satisfying set.

A note on graphs

  • By the way those binary relations are graph edges between graph nodes (the elements of the original set).
  • And by the way those edges are pointers.
  • Okay keep going.

Cycles

  • We use transitive closure to define cycles
    • We consider a graph \(G\)

\[ G = (V, E) := \{ \{V\}, \{V,E\}\} \]

  • We note that \(E\) is a homogenous binary relation over \(V\)
  • Take \(E^+\) the transitive closure of \(E\)

Cycles

  • \(G\) contains a cycle (has the property of being cyclic) if \(E^+\) contains an edge from a node to itself.

\(\text{Cyclic}(G = (V, E)) := \exists v \in V: \{v, v\} \in E^+\)

Note

  • Cycles are defined to be non-trivial, which means they don’t contain “loops”, so an edge from a node to itself.
    • We don’t think about this with Amtrak because we don’t take a train from Portland to Portland

Other Terms

  • Circuits, walks, paths, and trails are also defined in graph theory and are related.
  • Direct paths and undirected paths are, of course, distinct but intuitively so.
    • Cascades 503, a train which runs Seattle to Portland, has a Seattle to Tukwila edge but not a Tukwila to Seattle edge.
    • The Cascades route, which runs service between Vancouver, BC and Eugene, has both edges.

DAGs

  • A graph \(G = (V,E)\) is a directed, acyclic graph if:
    • It’s edges are directed, that is, \(E\) is a set of ordered pairs, and
    • The transitive closure of it’s edges \(E^+\) contains no identity pairs.

Tred-G

DAGs

  • Cascades 503 is a directed acyclic graph.
    • It has edges from all relative northern stations to all relative southern stations for all stations between Seattle and Portland inclusive.
  • Amtrak Cascades is not a DAG
    • It is not directed - SLM->PDX or PDX->SLM
    • It contains cycles - PDX->SLM->PDX

DAGs

  • Color wavelength is a directed acyclic graph.

finite_automata R R Y Y R->Y G G R->G Y->G

DAGs

  • A traffic light is not a directed graph but not an acyclic graph.

finite_automata Y Y R R Y->R G G R->G G->Y

DAGs

  • Friendship, via mutual enthusiastic consent, is a neither directed nor, necessarily, acyclic.
  • Facebook is not a DAG.

G n Thelma m Louise n--m

CS Major DAG

Tred-G

Today

  • ✓ Graphs
    • ✓ DAGs
    • Trees

Trees

  • A tree is DAG where every node has at most one incoming edge.
    • \(\exists\) undirected trees but we do not consider them.
  • A graph \(G = (V, E)\) is a tree if \[ \forall v \in V : |\{(v',v) \in E\}| \leq 1 \]
  • For all vertexes, the cardinality of the set of incoming edges is less than or equal to one.

Decision Trees

Manual decision tree

Decision Trees

DecisionCalcs

River Systems

Columbiarivermap

Data Structure

A rooted, non-binary, tree

Why trees?

  • If trying to store data, really only need one unique path to the data.
  • If trying to assure integrity of data with hashing, really only need to hash once.

Sample Tree

  • By convention, given a directed edge \((v,v')\), vertex \(v\) is denoted the parent and vertex \(v'\) is denoted the child.
struct tree_struct {
    void *data;
    size_t num_children;
    struct tree_struct *children;
}

Binary Tree

  • In computing, we often use binary trees, which are equivalently expressive and have other nice properties.
  • Commonly, the two children are referred to as “right” and “left”
struct binary_tree_struct {
    void *data;
    struct binary_tree_struct *left;
    struct binary_tree_struct *rite;
}

By The Way

  • It would have been fairly straightforward to implement list_t using trees.
  • Just why bother…
    • Minimally, can use just the right or left side and you have a linked list.

BST

  • In 100-level CS classes, students often implement a binary search tree.
  • These trees store lower values on lesser/left side
  • It basically implements quicksort in structures.

Merkle

  • In this class, we will do instead a Merkle tree or hash tree.
  • Merkle trees are unsorted but they are verified, which is what we want to do with our transactions.
  • In a block, we then store the root, the ultimate ancestor of all nodes, and store the transactions elsewhere.
  • This verifies the entire transaction tree.

Merkle

Graphical representation of the Merkle Tree. Illustration by David Göthberg

Tree Terms

  • Connected
  • Root
  • Leaf
  • Parent
  • Child
  • Height/depth
  • Size

Connected

  • Trees are connected
    • That is, if we take the transitive closure of edges
    • There is an edge from one node to every other node.
    • The transitive closure of the edges is the set of paths.
  • Connectedness is usually defined over undirected graphs.

Root

  • The root is the node which has a path to every other edge.
  • It is the “ancestor” of all nodes.
  • Often in computing, we treat a descriptor or pointer to the root as a descriptor or pointer to the graph.
  • A Bitcoin block contains a “Merkle root” of a transaction tree.

Leaf

  • Not every node has outgoing edges.
  • We refer to nodes without such edges as leafs (leaves?)

Diagram

  • 2 is the root
  • {2, 10, 5, 11, 4} are lea[fs/ves]

A rooted, non-binary, tree

Parent/Child

finite_automata Parent Parent Child Child Parent->Child

Height/Depth

  • The height or depth of a tree is the maximum number of edges from some root to a leaf.
  • For this tree it is 3
len(((5,3),(3,2),(2,1)))

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 1 1 2->1 6 6 7->6 8 8 7->8

Height

  • The height of a node is what the height of the subtree of which it is a root would be.
  • The node labelled 3 is of height 2.

finite_automata 3 3 2 2 3->2 4 4 3->4 1 1 2->1 5 5 5->3 7 7 5->7 6 6 7->6 8 8 7->8

Depth

  • The depth of a node is what the height of the subtree of which it is the lowest leaf would be.
  • The node labelled 4 is of depth 2.

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 1 1 2->1 6 6 7->6 8 8 7->8

Size

  • The size is generally taken to be the number of nodes.
    • It is separately, useful, sometimes, to count only lea[fs/ves].

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8 1 1 2->1

Today

  • ✓ Graphs
    • ✓ DAGs
    • ✓ Trees