Class 22
CS 202
30 April 2015

On the board
------------

1. Last time 
2. Stack smashing

---------------------------------------------------------------------------

1. Last time

    --some concepts in distributed computing
        --impossibility of the two generals problem
        --distributed algorithm for atomic multi-site commit: two-phase
        commit (2PC)

    --in presenting 2PC, we assumed that each of the individual worker
    had its own local transaction implementation and that something like
    a "PREPARE" message would cause the operations to be applied but not
    necessarily committed.

    --if you're coding and need atomic multisite commit:
    
        --start with 2PC. then identify the circumstances under which
        indefinite blocking can occur (and decide if it's an acceptable
        engineering risk)

        --if not, move to 3PC (and use references)

        --don't just make something up!! (analogous to concurrency
        issues...best to follow standard approaches)


2. Stack smashing

    --('buffer overflow' is one way to conduct a stack smashing attack.)


	--primitive form of linking, at exploit time!

	--relies on fork/exec separation

    --demo

	[NOTE: fork/exec separation is what allows us to write tcpserve:
	after the fork() but before exec() of buggy-server, child
	rearranges its file descriptors to be the socket itself. Also,
	this sample code gives you a chance to see sockets in action.]

        --remote host runs server. as Yang.

	--my laptop runs honest client

	--my laptop runs dishonest client

    --note: if this server had been running as root, we'd have been able
    to get a root shell

	--and if the user/syscall interface doesn't check its arguments
	properly, can buffer overflow that interface

	--in practice, once you have a user account on a machine, it's
	often possible to get root access (why? because the syscall
	interface is really hard to secure, as a matter of practice.)

    --arms race:

        --defenders mark stack as non-executable (if hardware provides a
        way to do that).
        
            response: return-to-libc 

	    [DRAW PICTURE]
    

        --defenders create W ^ X policy (see below) so that memory
        cannot be both writable and executable.

            response: return-oriented programming (ROP)

            [DRAW PICTURE]

            smash the stack with a bunch of return addresses. each
            return address points to the needed instruction followed by
            "ret" (requires the attacker to have previously identified
            these instructions in the code, so the assumption is that
            the attacker has access to the source code or binarhy). not
            too hard in CISC code like on x86, where there are lots of
            sequences of code embedded in the binary, even sequences
            that the programmer didn't mean (because instructions are
            not fixed length). result: the control flow bounces around
            all of these byte sequences in memory, executing exactly
            what the attacker wanted, but not executing off of the
            stack.

	    defending against ROP is hard (though if people use only
	    safe languages, that is, languages that do bounds checking
	    and other pointer checks, such attacks will be much, much
	    harder)

        --ROP requires access to the source or binary, so maybe we can
        just make sure that binaries don't fall into the hands of
        attackers?

            --Well, no that doesn't work either. A recent technique,
            *Blind* Return-Oriented Programming (BROP), shows how to
            conduct attacks even when the binary isn't available and
            even on 64-bit machines. References:

                http://www.scs.stanford.edu/brop/
                http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6956567
                http://www.scs.stanford.edu/~sorbo/brop/bittau-brop.pdf


        --other attacks:

	    --overwriting function pointers

	    --smashing the heap

    --what is W ^ X? map the stack pages as non-executable, if the
    hardware allows it. But there are some issues....
	
	--the original 386 did not allow it with page tables.
	However, all x86 chips that support extended page tables
	(which are used to help users get at >4GB of physical memory
	even if the machine is 32 bits) also support an XD bit in
	those page tables, which means "don't execute code in this
	page". We haven't worked with this bit in this class, but
	the architecture on modern 32-bit x86 supports it.
	
	--Even on x86s that don't suport extended page tables,
	segmentation would help with do-not-execute (since the
	permissions in the segment descriptor can express this).
	The disadvantage here is that the compiler needs to lay out
	the code and stack to match what the segments would require.

	--The bummer with W ^ X, even when it *is* supported, is
	this: some languages not only don't need it but also are
	actively harmed by W ^ X. The core of the issue is that a
	program written in a safe language (Perl, Python, Java,
	etc.) does not need W ^ X whereas lots of C programs do.
	Meanwhile some machines *always* enforce W ^ X, even for
	programs that do not need it. Such enforcement constrains
	certain languages, namely those that need to do runtime code
	generation.

    --what about the defense of address space layout randomization
    (ASLR)? This provides some help but obviously doesn't help our
    vulnerable server because our server tells the client where the
    buffer is.

        --And on 32-bit systems, the randomness can be defeated
        through brute force. There's only 20 bits of randomness
        conceivable (the VPN bits), and ASLR implementations left
        the top four alone, to avoid fragmenting VM.

        --On 64-bit systems, this defense can be defeated, if the server
        simply reforks children instead of restarting. See the BROP
        paper (referenced above)

    --what about the defense of canary values near the return address?
    StackGuard (in gcc), PaX, etc.

        --This can also be defeated; see again the BROP references.

    --Another defense: don't use C! CPUs are so fast that a language
    with bounds checking probably isn't going to pay a huge performance
    penalty relative to one without bounds checks

    --Question: can we instead confine processes and users so that when
    they're broken into, the damage is limited?