Merkle

Week 0xC II

Prof. Calvin

Crypto

Announcements

  • Welcome to variously CS 276/CS 540
    • Trees, ish
      • Heaps
      • Merkle Trees
  • Action Items:
    • Time for heap_t’s of fun

Today

  • Trees, ish
    • Heaps
    • Merkle Trees
  • Aside: Tx’s

Trees

\(n\)-ary Trees

  • A tree is a type of graph.
  • A graph is a pair of
    • A set of “nodes” or “vertices” (or “vertexes”)
    • A set of “edges” which are pairs of nodes
      • They may be ordered pairs or not.

Blockchain

  • Blockchain is a 1-ary tree
    • Usually called a link(ed) list in computing
    • Sometimes called a sequence, depending on which mathematical object.

finite_automata 0 gen esi sbl ock 1 prev Tx's time once 0->1 2 prev Tx's time once 1->2 ... ... 2->...

list_t

  • A list_t is a 1-ary tree

finite_automata 0 data next 1 data next 0->1 2 data null 1->2

mocha 2-tree

finite_automata 0 hot? 11 small? 0->11 y 12 small? 0->12 n 21 togo? 11->21 y 22 togo? 11->22 n 23 togo? 12->23 y 24 togo? 12->24 n

Common Case

  • Usually in computing, I use trees to store “stuff”
    • In a SQL database, I used a b-tree to store… data, I guess
    • In a binary search tree, I sort elements of sets with total orders, like integers or words
    • In OS page table, I store memory locations.

Our Case

  • We will store digests in nodes and transactions in leavefes.
    • So, 256 bits
  • Leaves a transaction “record”

Transaction review

  • Recall:
    • A bitcoin transaction and a bitcoin coin are identical
    • A coin is a chain of ownership
    • Own a coin by being the most recent owner.

Transaction review

  • Recall:
    • A transaction contains:
      • A sender
      • A receiver
      • An amount.
    • A sender/receiver is not a person par se but a public key

Transaction

  • Something like this:
tx.h
#define KEY_SIZE 4096 / 8 * sizeof(an_int)

typedef uint64_t an_int;

struct tx_struct {
    an_int send_n[KEY_SIZE]; /* 4096 bit sender modulus */
    an_int recv_n[KEY_SIZE]; /* 4096 bit recver modulus */
    uint64_t amount;         /* $\exists$ ~2.1 quadrillion atomics, ~2^50 */
}

In Practice

  • 64 + 64 + 1 == 129
  • Can’t RSA things bigger than \(n\), so must RSA in chunks
  • 129 / 3 == 129 // 3 == 43
  • Each 43 word block would be encrypted into a 64 word block.
tx.h
struct signed_tx_struct {
    an_int send_n[KEY_SIZE];    /* needed to decrypt */
    an_int chunks[3][KEY_SIZE]; /* 3 chunks of up to size 64 */
    uint32_t digest[8];           /* sha256 */
}

So basically

  • Make a tx_struct
  • Cast it to a uint64_t array
    • RSA encrypt with private key each third into chunks
    • SHA hash into digest
  • Load thirds into signed_tx_struct
  • Prefix with public key.

How do ensure?

  • We want to have confidence in these transactions.
  • Easy - hash them, and save the hashes.
  • But wait - that’s a lot of hashes to hold onto…
  • And it is unreasonable to store potentially thousands of hashes in each block!

Hash tree

finite_automata 0 Merkle Root 11 SHA digest 0->11 12 SHA digest 0->12 21 SHA digest 11->21 22 SHA digest 11->22 23 SHA digest 12->23 24 SHA digest 12->24 31 Tx0 21->31 32 Tx1 22->32 33 Tx2 23->33 34 Tx3 24->34

In practice

  1. Place on transaction in linear data structure, like a stack or array list
  2. Hash each transaction, store the results (!!! non-trivial)
  3. For each pair of hashes, compute the hash of the concatenation of the two hashes and store the results
  4. Repeat until there is a single hash, the “Merkle root”

Tree Terms

  • Complete

  • Definition: A tree in which every level, except possibly the deepest, is entirely filled. At depth n, the height of the tree, all nodes are as far left as possible.

Tree Terms

  • Perfect

  • Definition: A k-ary tree with all leaf nodes at same depth. All internal nodes have degree k.

  • Note: Authors differ in their definitions. This is called “complete” by [CLR90, page 95].

  • I take CLR as ground truth (more often 2nd+ edition as “CLRS”)

Complete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8

Complete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6

Complete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8 1 1 2->1

Incomplete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 6 6 7->6 8 8 7->8

Takeaway

  • In a complete tree, we can always place all leafves in a linear data structure, like an array.
  • In a complete binary (2-ary) tree, there will be (roughly) the same number of non-leaf and leaf nodes

Naive

  • We may be tempted to do this
struct hash_tree_struct {
    uint32_t digest[8];
    struct hash_tree_struct *left;
    struct hash_tree_struct *rite;
}
  • This could in fact work, mostly, but leads to some problems.
    • How do you tell if a pointer points to a leaf or not?

Today

  • ✓ Trees, ish
    • Heaps
    • Merkle Trees
  • ✓ Aside: Tx’s

Heap

Heaps

  • To motivate our technique for Merkle trees, we are going to make a digression to the heap abstract data type, also called a priority queue.
  • Vs. other graph representations, heaps can be implemented by store nodes only and deriving edges from a e.g. node’s index in an array.

Why “Priority”

  • A heap is a nonlinear (but linearizable) data structure which sorts* data… kinda.
  • A heap always has some minimal or maximal value, within the the collection, in its initial position, or root.

Make one

Empty Heap

  • Heaps are usually based on binary trees so we’ll make an example.
  • Start with an empty heap it looks like this:
  • There’s nothing here

Initial Value

  • Insert 128 as an initial value
  • 128 is the maximium, and also the root/initial value

finite_automata 128 128

Insert 64

  • To Insert a value, we Insert an unpopulated node that would make a complete tree of appropriate size.
  • In this case, in a binary tree, the first “child” of 128

finite_automata empty 128 128 128->empty

Insert 64

  • Place 64 here
  • 64 is \(\leq\) 128 so this is “A-OK”
    • “A-OK” is short for “Ada, Oklahoma” in honor of Ada Lovelace, the first computer scientist
    • (This isn’t true)

finite_automata 128 128 064 064 128->064

Insert 192

  • To Insert a value, we Insert an unpopulated node that would make a complete tree of appropriate size.
  • In this case, in a binary tree, the second “child” of 128

finite_automata 128 128 064 064 128->064 empty 128->empty

Insert 192

  • Place 192 here.
  • Big “uh oh”

finite_automata 128 128 064 064 128->064 192 192 128->192

Heapify

  • We do an operation often called “heapify”
  • In a loop
    • Swap new value with its parent
    • Terminate when (1) parent is greater or (2) no parent.
  • So swap 192 and 128

finite_automata 192 192 064 064 192->064 128 128 192->128

Insert 96

  • To Insert a value, we Insert an unpopulated node that would make a complete tree of appropriate size.
  • In this case, in a binary tree, the first “child” of 064

finite_automata 192 192 064 064 192->064 128 128 192->128 empty 064->empty

Insert 96

  • Place 96 here.

finite_automata 192 192 064 064 192->064 128 128 192->128 096 096 064->096

Heapify

  • 96 \(\gt\) 64 so swap
  • 96 \(\leq\) 192 so terminate

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064

Aside: Structs

  • It is possible to implement a heap like so:
struct heap_struct {
    uint8_t data;
    struct heap_struct *left;
    struct heap_struct *rite;
}
  • In this case, storing these 8 bit values would require structs of size 8 + 64 + 64 = 136 bits.
  • An overhead of of (136-8)/8 = 16x (or worse as 8 bits will take a 64 bit “slot”)

Aside: Arrays

  • Instead, realize we always Insert new elements to the… next available position.
  • This is a stack push, usually implement on top of a list or array.
typedef uint8_t *heap;
  • Overhead of zero.

Use arrays

  • In theory

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064

  • In practice

finite_automata heap 192 096 128 064

Insert 160

  • In theory

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064 empty 096->empty

  • In practice

finite_automata heap 192 096 128 064

Insert 160

  • In theory

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064 160 160 096->160

  • In practice

finite_automata heap 192 096 128 064 160

Heapify

  • In theory

finite_automata 160 160 064 064 160->064 096 096 160->096 192 192 192->160 128 128 192->128

  • In practice

finite_automata heap 192 160 128 064 096 heap:old->heap:new

On Indices

  • Consider what happens here with respect to indices
  • 160 is Inserted as the element at zero-index location 4

finite_automata heap 0 192 1 096 2 128 3 064 4 160

On Indices

  • After swap, 160 is at swaps at zero-index location 1
    • No particularly clear pattern here, to me

finite_automata heap 0 192 1 160 2 128 3 064 4 096

1-Index

  • 3 also would’ve swapped to 1
    • Floor/int div by 2
    • Wait, 4 -> 1 is an off-by-one int div
  • Index by one (1), not zero (0).

finite_automata heap 1 192 2 160 3 128 4 064 5 096

Insert 224

  1. Next index is 6.

finite_automata heap 1 192 2 160 3 128 4 064 5 096 6

Insert 224

  1. Next index is 6
  2. Insert 224 there

finite_automata heap 1 192 2 160 3 128 4 064 5 096 6 224

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2

finite_automata heap 1 192 2 160 3 128 4 064 5 096 6 224 heap:six->heap:three

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap

finite_automata heap 1 192 2 160 3 224 4 064 5 096 6 128 heap:six->heap:three

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap
  5. Check 1 = 3/2

finite_automata heap 1 192 2 160 3 224 4 064 5 096 6 128 heap:three->heap:one

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap
  5. Check 1 = 3/2

finite_automata heap 1 192 2 160 3 224 4 064 5 096 6 128 heap:three->heap:one

Heapified

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap
  5. Check 1 = 3/2
  6. Swap

finite_automata heap 1 224 2 160 3 192 4 064 5 096 6 128

Heapified

finite_automata 1 1 224 2 2 160 1->2 3 3 192 1->3 4 4 064 2->4 5 5 096 2->5 6 6 128 3->6

finite_automata heap 1 224 2 160 3 192 4 064 5 096 6 128

Today

  • ✓ Trees, ish
    • ✓ Heaps
    • Merkle Trees
  • ✓ Aside: Tx’s

Merkle

  • To create a Merkle tree, all non-leaf nodes are hashes and all leaf nodes are what the data type is.
  • Store the data type in array of length \(n\)
  • The tree requires \(n-1\) non-leaf nodes

Example

  • Suppose these are your transactions
    • Stored in an array (or a list_t!)

finite_automata heap send recv amnt A B 1 B C 2 C D 3 F A 7 G C 2 A G 4

Example

  • In a complete binary tree, the last row is a power of 2 in size.
    • We have 6 entries, so we will need part of a row of size 8 and part of a row of size 4

finite_automata heap send recv amnt A B 1 B C 2 C D 3 F A 7 G C 2 A G 4

Example

  • Fill out the tree to get up to the 4-row.

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7

finite_automata heap send recv amnt A B 1 B C 2 C D 3 F A 7 G C 2 A G 4

Example

  • Add additional leaves until there’s enough…

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7 8 4->8 9 4->9

finite_automata heap send recv amnt A B 1 B C 2 C D 3 F A 7 G C 2 A G 4

Example

  • More…

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7 8 4->8 9 4->9 A 5->A B 5->B

finite_automata heap send recv amnt A B 1 B C 2 C D 3 F A 7 G C 2 A G 4

Example

  1. Relate transactions to leaves.

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7 8 4->8 9 4->9 A 5->A B 5->B heap A B 1 B C 2 C D 3 F A 7 G C 2 A G 4 6->heap:e 7->heap:f 8->heap:a 9->heap:b A->heap:c B->heap:d

Hash Tx’s

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 h(G,C,2) 3->6 7 h(A,G,4) 3->7 8 h(A,B,1) 4->8 9 h(B,C,2) 4->9 A h(C,D,3) 5->A B h(F,A,7) 5->B heap A B 1 B C 2 C D 3 F A 7 G C 2 A G 4 6->heap:e 7->heap:f 8->heap:a 9->heap:b A->heap:c B->heap:d

Hash Hashes

finite_automata 1 0x1 h(*0x2,*0x3) 2 0x2 h(*0x4,*0x5) 1->2 3 0x3 h(*0x6,*0x7) 1->3 4 0x4 h(*0xA,*0xB) 2->4 5 0x5 h(*0x8,*0x9) 2->5 6 0x6 h(G,C,2) 3->6 7 0x7 h(A,G,4) 3->7 8 0x8 h(A,B,1) 4->8 9 0x9 h(B,C,2) 4->9 A 0xA h(C,D,3) 5->A B 0xB h(F,A,7) 5->B heap A B 1 B C 2 C D 3 F A 7 G C 2 A G 4 6->heap:e 7->heap:f 8->heap:a 9->heap:b A->heap:c B->heap:d

block_chain 0 h(prev) h(root) time nonce 1 0x1 h(*0x2,*0x3) 0:here->1 2 0x2 h(*0x4,*0x5) 1->2 3 0x3 h(*0x6,*0x7) 1->3 4 0x4 h(*0xA,*0xB) 2->4 5 0x5 h(*0x8,*0x9) 2->5 6 0x6 h(G,C,2) 3->6 7 0x7 h(A,G,4) 3->7 8 0x8 h(A,B,1) 4->8 9 0x9 h(B,C,2) 4->9 A 0xA h(C,D,3) 5->A B 0xB h(F,A,7) 5->B heap A B 1 B C 2 C D 3 F A 7 G C 2 A G 4 6->heap:e 7->heap:f 8->heap:a 9->heap:b A->heap:c B->heap:d

Today

  • ✓ Trees, ish
    • ✓ Heaps
    • ✓ Merkle Trees
  • ✓ Aside: Tx’s

block_chain 1 h(prev) h(root) time nonce 0 h(prev) h(root) time nonce 1->0:there 2 h(prev) h(root) time nonce 0->2:there merkle 0x1 h(*0x2,*0x3) 0x2 h(*0x4,*0x5) 0x3 h(*0x6,*0x7) 0x4 h(*0xA,*0xB) 0x5 h(*0x8,*0x9) 0x6 h(G,C,2) 0x7 h(A,G,4) 0x8 h(A,B,1) 0x9 h(B,C,2) 0xA h(C,D,3) 0xB h(F,A,7) 0->merkle:1 heap send recv amnt A B 1 B C 2 C D 3 F A 7 G C 2 A G 4 merkle:7->heap:f merkle:6->heap:e merkle:B->heap:d merkle:A->heap:c merkle:9->heap:b merkle:8->heap:a

tx.h
#define KEY_SIZE 4096 / 8 * sizeof(an_int)

typedef uint64_t an_int;
typedef struct hash_struct{uint32_t digest[8];} hash_t;

struct tx_struct {
    an_int send_n[KEY_SIZE]; /* 4096 bit sender modulus */
    an_int recv_n[KEY_SIZE]; /* 4096 bit recver modulus */
    uint64_t amount;     /* $\exists$ ~2.1 quadrillion atomics, ~2^50 */
}

typedef struct signed_tx_struct {
    an_int send_n[KEY_SIZE];    /* needed to decrypt */
    an_int chunks[3][KEY_SIZE]; /* 3 chunks of up to size 64 */
    hash_t digest;              /* sha256 */
} signed_tx_truct signtx;

struct hash_tree_struct {
    size_t  num_tx;
    hash_t *merkle; /* malloc to 2 * numx_tx * sizeof(hash_t) */
    signtx *txtree; /* malloc to     numx_tx * sizeof(signtx) */
}