Class 27
CS 372H
1 May 2012

On the board
------------

1. Last time 
2. Reflections on Trusting trust
3. Unix security model

---------------------------------------------------------------------------

1. Last time

    --finished VMWare ESX

    --stack smashing

2. Trusting trust

    --first of all, the word "trust" is a bad thing in computer security
    (this is an unfortunate linguistic fact). to "trust" something means
    to "assume it correct", which in turn means "to be in trouble if the
    assumption is false". so "removing trust" is a good thing. so is
    making things "trust*worthy*" (that is, worthy of being assumed
    correct), but it is in general hard to make any given component
    truly trustworthy.

	--you'll notice that the "trusted computing" initiatives from
	various powerful interests subvert this word. who exactly is
	being trusted and who is exactly isn't being trusted? "trusted
	computing" sounds great linguistically, but "trusted computing
	platforms" do not actually mean what they sound like

    A. background on this paper by Thompson:

	Thompson gave this lecture/paper after winning the Turing Award,
	which is considered by many to be the Nobel prize of Computer
	Science. The paper is stunning but takes patience and a few
	readings to understand. We're going to reproduce most of what
	Thompson did but will follow the ideas in an order different
	from the one in the paper. 

    B. adding a feature to a language

	What if we wanted to add a feature to Java? Say that the Java
	compiler is written in C, in a file called java.c. So we modify
	java.c, and rerun the C compiler on java.c, producing a new Java
	compiler that understands a new feature of Java

	Now what if we wanted to ad a feature to the C programming
	language? Well, for all practical purposes, the C compiler is
	also written in C, and let's assume that the entire C compiler is
	implemented in a file called "cc.c". To add a feature to the C
	programming language, we need to modify cc.c, and run the old C
	compiler on the new file. At this point, we have a new C
	compiler that understands a new feature of the language.

    C. Context

	As sometimes happens today, earlier versions of Unix were distributed with
	a full set of binaries and source for those binaries. This source included
	source for the compiler, the OS, the program 'login', etc.

	Because the system was quite small, it was common for people to make a
	change in one source file and then to recompile all of their programs. So
	program recompilation happened a lot.

    D. In this environment, how could someone as clever as Thompson add
    a bug to the login program without leaving a trace in the source
    files?

	**GOAL: have no source files hint at the bug, and meanwhile, the
	bug will persist across all recompilations

	[DRAW PICTURES]

    E. How can we write a self-reproducing program in pseudocode?

	X = "Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X."
	Output 'X'. Output '='. Output quote mark. Output X. Output quote mark. Output X.

	Run that, and you get itself.

	Here is a simpler version:

	    Print this followed by its quotation: "Print this followed
	    by its quotation".

	    [BTW, the GNU Public License works like this. It's a
	    self-replicating license! the license specifies that to make
	    a copy of the code, you have to release the source **with
	    the license itself included**. the license talks about
	    itself, just as a self-replicating program must.]

	Here's a self-replicating program in Scheme:

	    ((lambda (x) `(,x ',x))
	    '(lambda (x) `(,x ',x)))

    F. Result:

	some well-known string in the C compiler source now compiles to
	binary that does the following:

	    <<
	    (1) if compiling "login", insert a bug
	    
	    (2) if you see the well-known string in the C compiler
	    itself, replace it with everything between << >>
	    >>

    G. What's the moral of the story?


---------------------------------------------------------------------------

Admin notes

--short class; course evaluation forms passed out today

--final exam in a week and a half: start reviewing

--final projects due not long after: start working on them now

--review session Monday of the exam

---------------------------------------------------------------------------

3. Protection and security in Unix

    A. Intro

	--why security in the OS?
	    managing resources for different applications
	    must protect different users from one another
		file system
		memory
		processes

	--access control matrix (conceptual construct)

		       File 1       File 2       File 3 ....
	    User 1      r/w
	    User 2                    r
	    User 3                                  w

	--don't maintain matrix manually or entirely

	    --use tools such as groups or role-based access control

	    individuals		roles		    resources
		x                     r1                    a
		y                     r2                    b
		z                                           c

		    [lots of diagonal lines between but not across columns]
     
    B. The Unix protection model

	--designed for specific purpose: multiple users time-sharing a
	Unix system.

	here's the security model:

	    (i) UIDs and GIDs
	    (ii) access control on files, per UID and per GID
	    (iii) special user: root (UID=0) to which access control doesn't apply
	    (iv) privileged operations only root can do
	    (v) some implicit privileges
	    
	(i) process has a user ID and one or more group IDs

	(ii) access control on files
	
	    --system stores with each file

		--user who owns the file and group that file is in

		--permissions for user, anyone in the file's group, and other

		--can see this by doing "ls -l":
			rw- rw- r--  <owner>  <group>   ....  <fname>

			basic operations: read, write, execute [rwx] 

		--which permissions apply?

		    --if process's UID matches <owner>, then user permissions

		    --if process has GID matching <group>, then group permissions
		    --otherwise, 'other'.

	    --directory has permissions too

		--"read" means, roughly, "can list files in this directory"

		--"execute" means, roughly, "can use pathnames in this
		directory"

	(iii) uid 0, called root, treated specially by the kernel as administrator

	    --uid 0 has all permissions

	    --how do uid's get set?
		    setuid() call
		    uid=0 can change to any other uid
		    other uid's cannot invoke setuid(), to a first approximation
		
	    --Unix login
		runs as root
		checks username, password against /etc/shadow
		calls setuid(user), runs user's shell

	    --Here's more detail on login
	
		--Unix users typically stored in files in /etc

		    --key files include "passwd", "group", and, often,
		    "shadow" or "master.passwd"
		    
			--purpose of shadow file is to separate the material
			that needs to be user-visible (list of users and
			UIDs on the system) from the cryptographic material.
			passwords are never stored in the clear, but you
			don't want to expose "shadow" to users. they could
			then easily conduct an offline dictionary attack.
			making the shadow file non-world-readable addresses
			that.

		    --for each user, the files contain:

			--the textual username (for example, "mwalfish" or
			"root")

			--numeric user ID and group IDs

			--one-way hash of user's password:
			    (salt, H(salt,pasword))
			    
			    [salt makes it harder to break many passwords at
			    once: attacker who is working on the password
			    file offline cannot just compare the hashes to
			    pre-computed values; the attacker must, for each
			    entry, pump every password through the hash
			    function.]
	
			--other information, including user's full name,
			login shell, etc.

		    
		    --/usr/bin/login runs as root

			--Reads username and password from terminal
	
			--Looks up username in /etc/passwd, etc.

			--Computes H(salt, typed password) and checks
			that it matches the hash in the "shadow" or
			"password" file

			--If matches, sets group ID and user ID
			corresponding to username

			--Execute user's shell with execve system call


	    --Unfortunately, this security model (where uid 0 can do
	    anything) leads to lots of privileged code, which in turn
	    means that bugs are particularly dangerous.
	    
		--Here's an example that uses login. Consider that on
		some systems, rlogind (remote login) runs as root.
	    
		--The "login" command takes a flag, "-f" which means
		"don't ask for password". If the flag is supplied, the
		login succeeds only if the requested username is the
		current user, or if the current user is root.

		Unfortunately, rlogind runs "login username".

		Attack: at the login screen, pass user "-froot". So
		it looks like this:

		    login: -froot
		    [no password]
		    # 

		Why this worked on that buggy system: login sees
		"-f" flag and asks, "Is the requesting user the
		same?" (Answer: yes. Both login and rlogind are
		running as root here.)

	(iv) there are certain operations that only root can do

	    Examples:
		--binding to ports less than 1024
		--change current process's user or group ID
		--mount or unmount file systems
		--opening raw sockets (so you can do something like ping remote machines,
		for example)
		--set clock
		--halt or reboot machine
	    
		--change UIDs (so login program needs to run as root)

	    [Problem: you need to have all of root's permission to do
	    *any* of these things (yes, can drop privileges, but we'll
	    see that's easier said than done). That is a *lot* of
	    privilege to do any one action. That is problematic for
	    reasons we'll see.]

	(v) some implicit privileges 

	    --file descriptors, etc.

	    --fork() gives parent ability to control child, etc.

	    --can ptrace() processes at same UID (sort of)

	result: everything is a bit incoherent


    C. setuid

	--next time