pthread

CS 271

Prof. Calvin

08 Apr 24

wDd0

Announcements

  • "no step on snek": Linkaroo
  • Right now: You should have some networking code working.
  • The sample binary and recommended fcntl implementation is bad and wrong in a way we will learn to improve on today.
  • Blah blah blah C bad whatever just write code.

Today

Fork()
Hacks
Sleep
Pthread
function pointers
arg structs
create/exit/join
wordcount

C Bad

  • Okay we can agree this is awful right. char buf[20]; fcntl(0, F_SETFL, fcntl(0, F_GETFL) | O_NONBLOCK); sleep(4); int numRead = read(0, buf, 4); if (numRead > 0) { printf("You said: %s", buf); }
  • "sleep(4)" means do nothing for 4 seconds.
  • "read(0, buf, 4)" means do nothing until you can read from a buffer.
  • But "fcntl(...)" changes the meaning of BOTH
    • It changes sleep to end early if a read completes (that is, sleep for UP TO 4)
    • It changes read to end early if sleep completes (that is read whatever shows up within 4 seconds, including nothing).
    • But we have two different lines of code that kinda run at the same time.
  • C bad. C bad!

Fork()

"If there's a fork in the road, take it." -computers
  • If C is gonna do two things at the same time, it should be less sketchy and use like, code blocks. NAME fork - create a child process SYNOPSIS #include <sys/types.h> #include <unistd.h> pid_t fork(void); DESCRIPTION fork() creates a new process by duplicating the calling process. The new process is referred to as the child process. The calling process is referred to as the parent process.
  • Okay how does this thing work.

Fork()

  • Let's use it... #include <sys/types.h> #include <unistd.h> #include <stdio.h> int main() { if(fork()) { printf("Tis I, the elder and more terrible process.\n") ; } else { printf("Tis I, the more youthful and novel process.\n") ; } return 0 ; }
  • "If there's a fork, take it" user@DESKTOP-THMS2PJ:~$ gcc text.c ; ./a.out Tis I, the elder and more terrible process Tis I, the more youthful and novel process user@DESKTOP-THMS2PJ:~$

Fork()

  • Sleep and read at the same time. void main() { char buf[20]; if(fork()) { sleep(4) ; } else { int numRead = read(0, buf, 4); if (numRead > 0) { printf("You said: %s", buf); } } }
  • This is a bit cleaner.
    • No matter what, the program exits after 4 seconds.
    • If there's input text (type and hit enter) it is returned.
    • There's codeblocks splitting up the execution clearly.
  • Basically after fork() two different programs run, elder and youth.
  • They can do things "at the same time".
  • When elder ends, they both end.
  • If you negate fork, then it doesn't work (elder must sleeper).

Fork()

  • The elder and the youth know not how to share. int x = 7 ; // worlds greatest int dont @ me if(fork()) { x += 1 ; wait(NULL) ; // wait for youth, sys/wait.h } else { x += 2 ; }
  • x never gets to 10 user@DESKTOP-THMS2PJ:~$ gcc text.c ; ./a.out x = 9 x = 8 user@DESKTOP-THMS2PJ:~$
  • Using wait() to determine precedence has pros and cons.
  • Not sharing has pros and cons.
  • There's ways to share info (like sockets of course) but do we want to do that.

Fork()

  • For the worst thing you've ever seen in your life, share memory with a file. if (fork()) { for ( ; 1 ; i = (i + 1) % 26 ) { c = 'A' + i ; fopen(FNAME, "w") ; fwrite(&c, 1, 1, fp) ; fclose(fp) ; sleep(SLEEP) ; } } else { for ( ; 1 ; ) { printf("%c\n", c) ; fopen(FNAME, "r") ; fread(&c, 1, 1, fp) ; fclose(fp) ; sleep(SLEEP) ; } }
  • This is a good way to find out why you need to null check system calls...
  • But if you try it a few times it'll probably not break instantly at least.
  • Sleep before fclose to cause disasters at high probability.

Okay but how are we supposed to wait around to read something for a fixed amount of time?

Today

✓ Fork()
✓ Hacks
✓ Sleep
Pthread
function pointers
arg structs
create/exit/join
wordcount

<pthread.h>

Basically fork() was the worst so UNIX/POSIX invented pthreads
    NAME pthread_create - create a new thread SYNOPSIS #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); Compile and link with -pthread. DESCRIPTION The pthread_create() function starts a new thread in the calling process. The new thread starts execution by invoking start_routine(); arg is passed as the sole argument of start_routine().
  • Okay how does this thing work.

Function pointers

void *(*start_routine) (void *)
  • That is a pointer to a function accepts a void * argument and has a void * return value.
    • Here's an example: void *func( void *ptr ) { while(!sleep(1)) { printf("%d\n", *(int *)ptr ) ; } return NULL ; }
    • Here's how we'd make a variable that describes func() void * (*fptr)(void *) = &func ;
  • When we make a pthread, it needs somewhere to start executing - kinda like main().
    • With fork(), execution just followed the fork() call - again, pros and cons.
  • Void * argument and return allows us to
    • Use a pointer or struct to hold an arguments or return any values
    • Use casts to read from the arguments or return value. printf("%d\n", *(int *)ptr ) ;

pthread_create

pthread_create( &tid, NULL, &func, (void *) &val ) ;
  • Imagine we have func, which prints its argument every one second.
  • Here's how we set up a pthread to run func. void main() { pthread_t tid ; int val = 0 ; pthread_create( &tid, NULL, &func, (void *) &val ) ; while(!sleep(1)) { val++ ; } }
  • pthread_create has three arguments:
    • Where to store the thread id
    • Some options, which we will deal with latter or never
    • The big spooky function pointer #ominous
    • The arguments as a void *, usually casted from a meaningful data structure or data type.
  • This code then keeps increasing val, and we can observe what happens...

Pthreads

  • Pretty unlikely to get numbers exactly in order, and that's okay. user@DESKTOP-THMS2PJ:~$ gcc test.c ; ./a.out 0 2 3 4 4 6 7 8 9 10
  • There's way to synchronize this (out of scope). "man -k pthread_spin"
  • This is the cool, good, fun way to do things.
  • C good!

exit/join

int socket(AF_INET6, SOCK_STREAM, int protocol);
  • When you create a pthread, it runs until the whatever created the thread terminates.
  • Sometimes, we want to run until the last pthread is done with whatever it's doing.
  • We achieve this with pthread_join and pthread_exit. int pthread_join(pthread_t thread, void **retval); void pthread_exit(void *retval);
  • We can think of pthread_exit similar to stdlib exit() - it's a way to end the thread, rather than the program.
  • We can think of pthread_join similar to wait() - it's a way to keep the caller around until a callee finishes their job.
  • Let's see an example that I tricked ChatGPT into making. It's a little scuffed, but fun.

Today

✓ Fork()
✓ Hacks
✓ Sleep
✓ Pthread
✓ function pointers
✓ arg structs
✓ create/exit/join
wordcount

Word Count Problem and Pthreads Solution

  • The word count problem involves counting the number of words in a given text file. To efficiently solve this problem, we can utilize multiple threads with the pthreads library.

  • Using pthreads allows us to divide the file into smaller chunks and assign each chunk to a separate thread for processing. Each thread independently counts the words within its assigned chunk, and the individual counts are later combined to obtain the total word count.

  • This approach leverages parallel processing, enabling faster execution compared to a single-threaded solution, especially for large files. By utilizing pthreads, we can efficiently tackle the word count problem by distributing the workload across multiple threads, resulting in improved performance.

Include Libraries

    #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <ctype.h>
  • First, the code includes necessary libraries for pthreads, file operations, and standard input-output.
  • These libraries provide functions for thread creation, file handling, memory allocation, and printing to the console. #define MAX_THREADS 4 #define BUFFER_SIZE 1024
  • Constants are defined for maximum threads and buffer size.
  • These constants are used to control the number of threads and the size of the buffer for file reading.

Define Thread Data Structure

    struct thread_data { char *buffer; int start; int end; int word_count; };
  • A structure is defined to hold data for each thread.
  • It includes a buffer to store file content, start and end indices for processing, and a word count.
  • Quick, how many bytes is thread_data?

Thread Function

    void* count_words(void *arg) { struct thread_data *data = (struct thread_data*)arg; for (int i = data->start; i < data->end; i++) { if (isspace(data->buffer[i])) { data->word_count++; } } return NULL; }
  • The thread function 'count_words' counts the number of words in a given range of the buffer.
  • It iterates through the buffer and increments the word count when encountering whitespace characters.
  • Quick, what is the type of count_words? Why?
  • ChatGPT uses "isspace" here - it's counting spaces, not words.

Main Function

    int main(int argc, char *argv[]) { if (argc != 2) { printf("Usage: %s <filename>\n", argv[0]); return 1; } FILE *file = fopen(argv[1], "r"); if (!file) { printf("Could not open file.\n"); return 1; } // Remaining code omitted for brevity... }
  • The main function is the entry point of the program.
  • It takes a filename as an argument and opens the file for reading.
  • If the file cannot be opened, it prints an error message and exits.
  • If ChatGPT can null-check system calls, so can you.

Allocate Memory

    char *buffer = (char*)malloc(file_size); if (!buffer) { printf("Memory allocation failed.\n"); fclose(file); return 1; }
  • Memory is allocated for the buffer to hold the file content.
  • If memory allocation fails, an error message is printed, and the program exits.
  • I would never just rip an entire file into a malloc, since that seems mean to the computer, but this is a computer doing it to another computer so it's okay.
  • Quick, how do you do this without reading the entire file?

Read File Content

    fread(buffer, 1, file_size, file); fclose(file);
  • The file content is read into the buffer using fread.
  • If the read operation fails, an error message is printed, and the program exits.
  • ChatGPT just stopped checking the return values of system calls here. Your jobs are safe.

Create Threads

    pthread_t threads[MAX_THREADS]; struct thread_data thread_data_array[MAX_THREADS]; int start = 0; for (int i = 0; i < MAX_THREADS; i++) { thread_data_array[i].buffer = buffer; thread_data_array[i].start = start; thread_data_array[i].end = start + chunk_size + (i < remaining ? 1 : 0); thread_data_array[i].word_count = 0; pthread_create(&threads[i], NULL, count_words, (void*)&thread_data_array[i]); start = thread_data_array[i].end; }
  • Threads are created to process the file content in parallel.
  • Each thread is assigned a portion of the buffer to count words.
  • I think a human would write this in a way where the arithmetic is easier, but maybe not.
  • Presumably a human wrote this somewhere and it's just plagiarized.
  • I was pretty sure count_words needed a unary & prefix there, but what do I know. It runs fine.

Join Threads

    for (int i = 0; i < MAX_THREADS; i++) { pthread_join(threads[i], NULL); }
  • The main thread waits for all worker threads to finish using pthread_join.
  • This ensures that the total word count is accurate before printing.

Calculate Total Word Count

    int total_word_count = 0; for (int i = 0; i < MAX_THREADS; i++) { total_word_count += thread_data_array[i].word_count; }
  • The total word count is calculated by summing up individual thread word counts.

Print Result

    printf("Total word count: %d\n", total_word_count); free(buffer); return 0;
  • The total word count is printed to the console.
  • Memory allocated for the buffer is freed to prevent memory leaks.
  • The program terminates successfully.

Today

✓ Fork()
✓ Hacks
✓ Sleep
✓ Pthread
✓ function pointers
✓ arg structs
✓ create/exit/join
wordcount