Class 8 CS 372H 12 February 2010 (by video) Outline ------- 1. Last time 2. Threads --Intro --User-level threads --Kernel threads --Scheduling threads --------------------------------------------------------------------------- 1. Last time --replacement policies --point was: --optimal is known as OPT or MIN (textbook asserts but doesn't prove optimality...notes from last time contain a proof) --LRU is usually a good approximation to optimal --Implementing LRU in hardware or at OS/hardware interface is a pain --So implement CLOCK or NTH CHANCE ... decent approximations to LRU, which is in turn good approximation to OPT *assuming that past is a good predictor of the future* --note that caching doesn't always save the day: there may simply be too much demand on memory --see notes from last time about ways of handling this case 2. Threads A. Introduction --Recall what processes were all about: way to isolate some computations. give the process the illusion that it is executing sequentially --But process isn't enough...... might want to have a process that takes advantage of multiple CPUs (why not rely on OS to schedule different processes on different CPUs?) some computations are naturally structured as being done in parallel. Examples: --producer/consumer situations. shows up everywhere: get messages from the network, each message causes the process to execute a query on a database. --could structure this as a process that just reads from the network and the disk at once and just does everything together --potentially cleaner way to do it: one thread reads from the network and classifies requests. another thread consumes from the queues and answers the requests. --Web servers --want a pool of different threads to handle requests from the network --I/O intensive sub-tasks mixed with CPU intensive sub-tasks --CPU intensive sub-tasks mixed with other CPU intensive sub-tasks --counter-argument: if you're always I/O bound, avoid threads (and their accompanying errors) and just program in event-driven style. Very old debate. Threading is winning because event-driven can't really take advantage of multiple CPUs --*threads* are an abstraction that represents a sequential set of instructions but that executes within the address space of a process. can see the same memory that other threads can. more specifically, a thread is a set of registers, including a program counter and a stack, but *not* its own page directory. [draw picture comparing single-threaded process to multi-threaded process] --abstraction/illusion: multiple threads are executing at once but we'll see that this only actually happens sometimes --NOTE: In class we talked about processes first, and now we are talking about threads, but in the labs, you will first work with threads (in lab T) and then implement processes in JOS, which are known as environments. classification: # address spaces one many # threads/ addr space one MS Dos traditional Unix Palm OS many Embedded systems, VMS, Mach, NT, Solaris, HP-UX, ... Pilot (OS on first personal computer ever built -- the Alto. idea was there was no need for protection if there was only one user.) --NOTE: lots of ways to structure computations....... event-driven threaded processes different computers [we'll come back to this point later on.] threads are a very natural way to do multiple tasks but operating on the same memory state. --thread API tid thread_create(void (*fn)(void*), void* arg) --create a new thread, run fn(arg) void thread_exit --destroy current thread void thread_join(tid thread) --wait for thread to exit and lots of support for synchronization (which we'll see perhaps later today and in upcoming classes) --example use: --threaded web server services clients simultaneously: for (;;) { fd = accept_client (); thread_create (service_client, &fd); } --we will see other uses.... --So let's examine threads for a bit..... two common models: * user-level threads * kernel threads in both cases we will look at: * thread control blocks * dispatch/switch() * the level of true concurrency B. User-level threads --kernel is totally ignorant of user-level threads --thread_create() allocates a new stack --do we need memory space for registers? --keep a queue of runnable threads --run-time system: --wraps system calls: if they would block, switch, and run a different thread --does scheduling --thread is running --save thread state (to TCB) --Choose new thread to run --Load its state (from TCB) --new thread is running --when do the above steps happen? Two options: 1. Only when a thread calls yield() or blocks on I/O --This is called *cooperative multithreading* or *non-preemptive multithreading*. --Upside: Makes it pretty easy to avoid errors from concurrency --Downside: Harder to program because now the threads have to be good about yielding, and you might have forgotten to yield inside a CPU-bound task. 2. What if we wanted to make user-level threads switch non-deterministically? --deliver a periodic timer interrupt or signal to a thread scheduler [setitimer() ]. When it gets its interrupt, swap out the thread. --makes it way more complex to program with user-level threads --in practice, systems aren't usually built this way, but sometimes it is what you want (e.g., if you're simulating some OS-like thing inside a process, and you want non-determinism). --Multi-threaded web server example --Thread calls read to get data from remote web browser --"fake" user-level read call makes the read() syscall in non-blocking mode --No data? schedule another thread --When idle or on timer check which connections have new data, and switch() to one of them --How to switch threads in cooperative context? see handout..... [draw picture of the two stacks] basic idea: switch() called at "sane" moments, in response to a function call from a thread(). That function is usually yield(), i.e., the call graph usually looks like this: read_wrapper() check whether read would block if read would block yield() switch() make sure you understand what is going on and how switch() works..... --What if we are in non-cooperative context? then a thread could be switched out at any moment, so its state is not neatly arranged on the stack, per the call graph but in that case, the OS would have put the thread's registers in a trap frame, and the run-time can yank the thread's registers, save them in the TCB or on the thread's regular stack, and then restore them later (i.e., thread switching by the user-level run time looks a lot like process switching by the kernel). Notes/questions: --In kernel's PCB, only one set of registers is stored..... --QUESTION: where are the other registers for the other threads? Disadvantages to user-level threads: --Can we imagine having two user-level threads truly executing at once, that is on two different processors? (Answer: no.) --Related question: what happens if a user-level thread executes a blocking system call, like read(fd, ....) to a disk? --answer: *all* threads block because, to the kernel, the process is blocked --This is why threading libraries typically wrap system calls: so that when the thread calls read(), the library turns it into a non-blocking version of read(). --Unfortunately, disk calls in traditional Unix are always blocking, so we either need to: --extend the API --live with this --use elaborate hacks with memory mapped files (e.g., files are all memory mapped, and runtime asks to handle its own page faults, if the OS allows it) --What if the OS handles page faults for the process? (then a page fault in one thread blocks all threads). C. Kernel threads --Kernel maintains TCBs --looks a lot like PCB --[Draw picture] --thread_create() becomes a syscall --when do thread switches happen? --with kernel-level threads, it can happen at any point. --basic game plan for dispatch/switch: --thread is running --switch to kernel --save thread state (to TCB) --Choose new thread to run --Load its state (from TCB) --new thread is running --Can two kernel-level threads execute on two different processors? (Answer: yes.) --Disadvantage to kernel threads: --every thread operation (create, exit, join, synchronize, etc.) goes through the kernel --> 10x-30x slower than user-level threads --heavier-weight memory requirements (each thread gets a stack in user space *and* within the kernel. compare to user-level threads: each thread gets a stack in user space, and there's one stack within the kernel that corresponds to the process.) --Old debates about user-level threads vs. kernel threads. The "Scheduler Activations" paper, by Anderson et al., [ACM Transactions on Computer Systems 10, 1 (February 1992), pp. 53--79] proposes an abstraction that is a hybrid of the two. --Some people think that threads, i.e., concurrent applications, shouldn't be used at all (because of the many bugs and difficult cases that come up, as we'll discuss). However, that position is becoming increasingly less tenable, given multicore computing. --The fundamental reason is this: if you have a computation-intensive job that wants to take advantage of all of the hardware resources of a machine, you either need to (a) structure the job as different processes; or (b) use kernel threads. There is no other way, given mainstream OS abstractions, to take advantage of a machine's parallelism. (a) winds up being inconvenient (in order to share data, the processes either have to separately set up shared memory regions, or else pass messages). So people use (b). D. Scheduling threads --Dispatcher can choose: --to run each thread to completion --time-slice in big chunks --time-slice so that each thread executes only one instruction at a time --Programs must work in all cases, for all interleavings --So how can you know if your concurrent program works? Whether *all* interleavings work? 1. Enumerate and test all possibilities? (Impossible.) 2. Instead, maintain *invariants* on program state; structure program carefully to maintain these invariants --General strategy for dealing with concurrency: --use *atomic actions* [means the action is indivisible, regardless of how things are interleaved] to.... --....build higher-level abstractions.... --example: mutexes --....that provide invariants we can reason about.... --example: only one thread of control is modifying a linked list at once --This is our transition to the general topic of concurrency, which will occupy us for the next few classes