malloc

Week 0x9 II

Prof. Calvin

Crypto

Announcements

  • Welcome to variously CS 276/CS 540
    • Passes of storing structured data
    • Antipatterns no more
  • Action Items:
    • BigRSA due this week.
    • list_t possible after this

Data Clump

  • C contains a language-mandatory data clump anti-pattern
  • This is a data clump - two values that only make sense together.
int main(int argc, char **argv)

Today

  • malloc
    • Dynamically sized C
  • free
    • Unmalloc

Review

  • Python has no array (NumPy does)

    • Historically lists \(\neq\) arrays
    • Python lists are closer to being array-lists - an implementation of a list abstract data type using an array data structure

Arrays are:

  • Fixed length (replace only, no add/remove)
int arr[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } ;
arr = { 1, 2, 3 } ; // compile error 
arr[15] = -1 ; // runtime error - "stack smashing"`
  • Typed, mostly (still just bits, but of certain size)
char *str = "hihi" ; 
arr[3] = str ; // compile warning - integer from pointer w/o cast

Arrays are:

  • Just “bits” storing a memory address - no known type or size.
arr[sizeof(arr) -1] = -1 ; // *can* be a segmentation fault
  • Arrays exist at fixed location in physical memory.
int *old = arr, *new ; 
new = arr ; 
old == new ; // this is "true"

Review

  • We use char * in a function argument to account for arrays of any size.
    • What does this mean *physically* within the physical computing device?
  • We use “char arr[\(n\)]” in the main function so we know we have enough space.
    • What does this mean *physically* within the physical computing device?

Takeaways

Pointers Arrays
Fixed size, like 8 Any specified size
Change with = = triggers error
Names some bits Provides/names bits
Can describe any array One specific array

malloc

Malloc

#include <stdlib.h>
void *malloc(size_t size);
  • void * is new, that is how we refer to something but we don’t know to what.
    • Could be a string, could be a vector of strings (argv) could the message schedule array in SHA.

Malloc

  • returns a void *
    • This gives the location of some bits
    • Those bits can be used however
    • The argument size is the number of bytes
    char *ptr = (char *)malloc(sizeof(char) * 6) ;
  • Once we have the void *, we can use a cast to change it to some other star.
    • Voilà, something a lot like an array, but of software defined size
      • That is, can perform a calculation how much space you need, then dget it.
      • “Here is some memory” vs “This memory contains characters”.

Malloc

  • Treat this memory region the same way we treat a character array.
    • Handwave the null terminator for now.
    char *str = "hello"; 
    char buf[6]; 
    char *ptr = (char *)malloc(sizeof(char) * 6) ;
    size_t i; 
    for (i = 0; src[i]; i++) { 
      buf[i] = str[i]; 
      ptr[i] = str[i]; 
    } 
    printf("%s %s %s\n", str, buf, ptr) ; // "hello hello hello"

size_t

  • Imagine mallocing all of memory in a single call.
    • malloc(0xFF...) will either crash or return a void * of zero
  • Imagine mallocing all of memory one byte at a time
    • This would return 0xFF… void *, the largest of which would be 0xFF…
  • void * and size_t are the same size.

void *

#include <stdlib.h> /* for size_t */
#include <assert.h> // for assert

int main() {
    assert(sizeof(void *) == sizeof(size_t)) ; /* pass */
    assert(sizeof(char) == 1) ;                /* pass */
    assert(1 == 0) ;                           /* fail */
    return 0;
}
  • Read more on assert here (or don’t).

From hence

  • But where does the memory come from?
  • So far, “the stack” - akin to the data structure of the same name
    • Operating Systems concept
  • For malloc, “the heap” - again akin
  • Stack and heap exist in different physical regions of system memory
    • So stack memory is near stack memory but distant from heap memory.

Stack & Heap

char arr0[256], arr1[256], arr2[256]; 
char *ptr0 = malloc(256), *ptr1 = malloc(256), *ptr2 = malloc(256); 
printf("%p\n%p\n%p\n%p\n%p\n%p\n", arr0, arr1, arr2, ptr0, ptr1, ptr2) ;
  • How far are all these things apart from each other:
  • Hmmm they all end in zero?

Stack & Heap

  • In C, bits are in the stack, where we declare variables, or heap, another special magic place.
  • Stack memory in explicit sizes fixed when code is compiled.
  • We have only used stack memory so far so we have to fix memory size when we write the code.

Stack & Heap

  • In C, bits are in the stack, where we declare variables, or heap, another special magic place.
  • A scratchpad space that GCC configures programs to request and the OS allows use of (up to some limit).
  • The “magic” is implementation details of the compiler and the operating system.
  • The only way we will learn to interface with the heap is malloc.

Stack & Heap

Stack Heap
Fixed Size Arbitrary Size
Holds Function Variables Returned by a function (malloc)
Defined when compiling by GCC Defined when running by OS using magic
Higher/larger (~0xFF...) Lower/smaller (~0x00...)

Credit

Jenny Chen Ruohao Guo
she/her she/her
Software Engineer Graduate Research Assistant
Apple Georgia Institute of Technology
B.S. Computer Science, 2021, UIUC B.S. Computer Science, 2021, UIUC

Stack

Memory layout

  • stack: function variables, functions, globals
  • heap: malloced variables
  • Other stuff for an OS/compilers class

Stack

  • Stack reserved when variables declared
    • Why C89 requires declares before code
    • Why declares require a type with a size.
  • Regarded as “growing downwards” from large address to small.

Example

  1. Allocate sizeof(int) bytes for variable a for function main
    • A is uninitialized, so the value is undefined.
    • Compilers may initialize to a default value
    • Assume they don’t.

Example

  1. Allocate a for main
  2. Allocate sizeof(int) bytes for variable b for function main
    • Store the numerical value (int)-3 in these bytes.

Example

  1. Allocate a for main
  2. Allocate b for main
  3. Allocate sizeof(int) bytes for variable c for main
    • Store the numerical value (int)12345

Function Call

  1. Allocate a for main
  2. Allocate b for main
  3. Allocate c for main
  4. Allocate sizeof(int) bytes for variable a for hello
    • Store (int)100
    • Perhaps hello.a vs main.a

Function Call

  1. Allocate a for main
  2. Allocate b for main
  3. Allocate ‘c’ for main’c for main
    • Store the numerical value (int)12345

What of return

  1. Deallocate the function’s stack.
  2. “Stack push” the return value.
  3. a already at the top (bottom) of the stack.
    • Still 100, no longer hello.a

What of return

  1. The calling function does a “stack pop”
  2. The “stack pop” is stored as d
  3. The 100 never moves.
    • That’s why we use a stack.

Return

  1. Push main.a
  2. Push main.b=3
  3. Push main.c=12345
  4. Call hello
    1. Push hello.a=100
    2. return
  5. Pop a’s value into into main.d

Stack Discussion?

Stack

Heap Example

  1. Three operations.
    • Stack-allocate sizeof(int *) bytes for main.p.
    • Heap-allocate sizeof(int) bytes
    • Store the address in p.

  • As a reference, we denote this with an arrow rather than by showing a value.

Heap Example

  • *p is the value of the bits on the heap.
    • It is an int
    • It is 4 bytes
  • p is the value of the bits on the stack.
    • It is an int * or void *
    • It is 8 bytes

Heap Example

  1. Malloc
  2. Store 0 at main.p
    • Or store 0 at the location described by main.p
    • Not a push operation!

Heap Example

  • Stack-allocate main.q
  • Heap-allocate 2 * sizeof(int) bytes
  • Make a note that they are ints
  • Store the address in main.q.

Heap Example

  1. Malloc p
  2. Store 0 at main.p
  3. Malloc q

Heap Example

  1. Malloc p
  2. Store 0 at main.p
  3. Malloc q
  4. Store 1 at main.q

Heap Example

  1. Malloc p
  2. Store 0 at main.p
  3. Malloc q
  4. Store 1 at main.q
  5. Store 2 at index 1 of the int array which begins at main.q

Heap Example

  1. Malloc p
  2. Store 0 at main.p
  3. Malloc q
  4. Store 1 at main.q
  5. Store 2 in main.q[1]
  6. Store the value of main.q (a location) in main.p

Heap Example

  1. Malloc p
  2. Store 0 at main.p
  3. Malloc q
  4. Store 1 at main.q
  5. Store 2 in main.q[1]
  6. Store the value of main.q (a location) in main.p

Today

  • malloc
    • Dynamically sized C
  • free
    • Unmalloc

Free

  1. p holds address of the heap location holding the integer value 0.
  2. q holds address of the heap location holding the integer array value 1.
    • Same location as integer array {1, 2} in this case.

Free

Free

void free(void *_Nullable ptr);
  • p is a pointer returned from malloc
    • We term this type of pointer a “*_Nullable”.
    • Not all *’s and *_Nullable’s \[ \{ p \in \text{*_Nullable}\} \subset \{ p \in * \} \]
int *arr = malloc(4 * sizeof(int));
int buf[4];
int *ptr;
ptr = buf; /* Star but not Nullable */
ptr = arr; /* Star_Nullable */

Free

  • Every malloc in your code should have a corresponding free
  • Otherwise you could run out memory (or other problems)

Memory Leak

  1. p holds a *_Nullable.
  2. q holds a *_Nullable.

Memory Leak

  1. p, q holds a *_Nullable.
  2. “old p” forgotten!

Your poor OS

  • Your poor OS is on contract to protect that 1 you left in “old p” forever!
  • This is why sometimes restarting your computer causes it work.
  • E.g. Java, Python have a “garbage collector” that frees memory for you and causes you code to run 500 times (not always an exaggeration) slower.
  • Also if you try really hard you can memory leak Python.

Instead

  1. free(p) and the bytes return to circulation.
  2. 0 persists until overwritten*

Valgrind

  • Verifying that all memory has been freed isn’t easy!
  • I recommend use of valgrind
  • I won’t teach Valgrind this term but may show it time to time.

leaky.c

  • Write a quick memory leaking program:
leaky.c
#include <stdlib.h>

void main() {
        int *p = (int *)malloc(sizeof(int));
        int *q = (int *)malloc(sizeof(int));
        p = q;
}

Valgrind

  • Compile a run within valgrind

Emph

==1331== LEAK SUMMARY:
==1331==    definitely lost: 4 bytes in 1 blocks
  • Those are these 4 bytes:
int *p = (int *)malloc(sizeof(int));

Fix it

leaky.c
#include <stdlib.h>

void main() {
        int *p = (int *)malloc(sizeof(int));
        int *q = (int *)malloc(sizeof(int));
        free(p);
        p = q;
}

“Good Enough”

  • It is possible to confuse Valgrind (and I intend to do so if we have time)
  • As a rule, if it confused Valgrind it likely contains some antipattern.
    • Up to debate with my planned example.

Free + Leak = Freak

  • We can generate a silly outcome at high probability by:

    1. Store value to heap
    2. Memory leak
    3. Check value

Freak

#include <stdio.h>
#include <stdlib.h>

void main() {
    int *p = malloc(sizeof(int)), *q, i ;
    *p = 1 ;                         
    printf("%d\n", *p) ;             
    free(p) ;
    for ( i = 0 ; i < 1000000 ; i++) {
        q = malloc(0xFF) ;
    }
    printf("%d\n", *p) ;   
}
  • Run it:

Without free

#include <stdio.h>
#include <stdlib.h>

void main() {
    int *p = malloc(sizeof(int)), *q, i ;
    *p = 1 ;                         
    printf("%d\n", *p) ;          
    for ( i = 0 ; i < 1000000 ; i++) {
        q = malloc(0xFF) ;
    }
    printf("%d\n", *p) ;   
}
  • Run it:

Without leaks

#include <stdio.h>
#include <stdlib.h>

void main() {
    int *p = malloc(sizeof(int)), *q, i ;
    *p = 1 ;                         
    printf("%d\n", *p) ;          
    free(p)
    printf("%d\n", *p) ;   
}
  • Run it:
  • 1 is unprotected but not yet overwritten.

Today

  • malloc
    • Dynamically sized C
  • free
    • Unmalloc
  • Memory-adjacent techniques?
    • If time

Overthinking

  • It’s just bits.
  • A void *, a size_t, and a character array of length 8 walk into a compiler.
    • The compiler asks “Why the long int”?
  • In running code, there is no distinction between any of these: each is simply 64 bits.
  • The compiler maintains the distinction when generating code to make writing code easier for humans.

Type Use Print code sizeof(), usually
void * a memory location %p 8
size_t size of some memory %zu or %ld 8
char buf[8] 8 values of size 1 %s 8
long,long int,int64_t \(\text{abs}(x) <= 2^{63}\) %ld 8

Casting

  • Casting avoids gcc warnings/errors:
#include <stdio.h>
#include <stdlib.h>

void main() {
        char *buf[8];
        void *p = (void *)buf;
        void *q = malloc(1);
        size_t dist = (size_t)p - (size_t)q;
        printf("q was malloc'ed %zu bytes from stack allocated p.\n", dist);
}
  • See it:

Implicit Cast

  • Can infer casts, but some draw warnings:
  • char array to void * is fine - both addresses

Documentation

  • Sometimes we can use casts to make it more clear what our code should be doing.
char *str = (char *)malloc(sizeof(char) * 8) ; // malloc returns void *`
  • I like void casts, they remind me of Python “_ =” which I use in notebooks to discard output.
(void)printf("%s\n", str) ; // printf returns an int - we don't care.

Takeaways

  • Cast the return value of malloc.
int main() { 
    char *ptr = malloc(8) ; // error-prone, ambigious 
    char *str = (char *)malloc(sizeof(char) * 8) ; // more intentional 
}
  • Much bigger deal when using types of size other than one, or of unknown size.

Pointer Arithmetic

  • Wait a minute… sizeof(int) != 1.
  • So q is must be some value other than 1 away from q[1]
  • Yet we do not address the next int in an array by saying q[1*sizeof(int)]

Overload

  • People are allowed to like things, so you are allowed to like this.
  • I don’t.
>>> x, y, s, t = 1, 2, "h", "i" 
>>> x + y 3 
>>> x + s 
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for +: 'int' and 'str' 
>>> s + t 'hi' 
>>>
  • This is called operator overloading.
  • It’s not allowed in C.

Two many gcc’s

  • If you add strings together, gcc stops you.
void main() {
        "a" + "b";
}
  • Thanks, gcc.
leaky.c: In function ‘main’:
leaky.c:2:13: error: invalid operands to binary + (have ‘char *’ and ‘char *’)
    2 |         "a" + "b";
      |         ~~~ ^
      |         |   |
      |         |   char *
      |         char *
  • What does “binary” mean ? (Hint: MATH 251W)

Trust gcc

  • Let’s do a cast and an addition
void main() { 
    uint64_t *p = (uint64_t *) 0x100 ; 
    uint64_t x = (uint64_t) 0x001 ; 
    // With casting, %p expects "void *", so we cast to (void *) 
    printf("%p + %p = %p\n", (void *)p, (void *)x, (void *)(p + x)); 
}

Trust!

  • At least it’s consistent.
printf("%p + %p = %p \n", (void *)p, (void *)x, (void *)((char *)p + x)) ; 
printf("%p + %p = %p \n", (void *)p, (void *)x, (void *)((int *)p + x)) ; 
printf("%p + %p = %p \n", (void *)p, (void *)x, (void *)((long *)p + x)) ;`
  • All 0x108 I’m sure.
  • Get it?

Overload?

  • “operator overloading… not allowed in C.”
  • Addition and subtraction… are(?) overloaded
    • Realistically, not quite (not commutative)
  • Add a (1) location and (2) integer
    • int * + int
    • int * + long
    • char * + int
    • long * + char

On []

  • Pointer arithmetic too.
int arr[4] = { 0x10, 0x100, 0x1000, 0x10000 } ; 
printf(" arr+1 : %p\n", arr+1) ; 
printf("*(arr+1): %p\n", *(arr+1)) ; 
printf("(*arr+1): %p\n", (*arr+1)) ; 
printf(" arr[1]: %p\n", arr[1]) ;
  • See it.

Unary &

  • & is both a unary and binary operator in C, like - (minus)
int main() { 
    int x, y
    x = 1 - 2; 
    y = - 2; 
    x = x & y;
    x = &y; 
}

&

  • Unary & is inverse *
int main() { 
    int x = 0xF0, y = 0x0F, *p; // just unique vals 
    p = &y; 
    printf("*p = %x,  p = %p\n", *p,  p); 
    printf(" y = %x, &y = %p\n",  y, &y);
}
  • p = &y \(\implies\) *p = y

&

  • * is not (quite) inverse &
int main() { 
    int x = 0xF0, y = 0x0F, *p; // just unique vals 
    *p = y; 
    printf("*p = %x,  p = %p\n", *p,  p); 
    printf(" y = %x, &y = %p\n",  y, &y);
}
  • *p = y \(\not\!\!\!\implies\) p = &y

Malloc fail

  • There is no guarantee malloc worked

    • Imagine malloc(∞)
    • Rather, it is very likely to return correctly if used mindfully.
    • But you must check.
void main() { 
    int *p = (int *)malloc(sizeof(int));
    if (p == 0) { 
       fprintf(stderr, "Malloc failed.\n"); 
       exit(-1); 
    } 
}

Force fail

void main() { 
    int *p = (int *)malloc(-1);
    if (p == 0) { 
       fprintf(stderr, "Malloc failed.\n"); 
       exit(-1); 
    } 
}

Today

  • malloc
    • Dynamically sized C
  • free
    • Unmalloc
  • ✓ Memory-adjacent techniques?
    • If time