Class 22 CS 202 30 April 2015 On the board ------------ 1. Last time 2. Stack smashing --------------------------------------------------------------------------- 1. Last time --some concepts in distributed computing --impossibility of the two generals problem --distributed algorithm for atomic multi-site commit: two-phase commit (2PC) --in presenting 2PC, we assumed that each of the individual worker had its own local transaction implementation and that something like a "PREPARE" message would cause the operations to be applied but not necessarily committed. --if you're coding and need atomic multisite commit: --start with 2PC. then identify the circumstances under which indefinite blocking can occur (and decide if it's an acceptable engineering risk) --if not, move to 3PC (and use references) --don't just make something up!! (analogous to concurrency issues...best to follow standard approaches) 2. Stack smashing --('buffer overflow' is one way to conduct a stack smashing attack.) --primitive form of linking, at exploit time! --relies on fork/exec separation --demo [NOTE: fork/exec separation is what allows us to write tcpserve: after the fork() but before exec() of buggy-server, child rearranges its file descriptors to be the socket itself. Also, this sample code gives you a chance to see sockets in action.] --remote host runs server. as Yang. --my laptop runs honest client --my laptop runs dishonest client --note: if this server had been running as root, we'd have been able to get a root shell --and if the user/syscall interface doesn't check its arguments properly, can buffer overflow that interface --in practice, once you have a user account on a machine, it's often possible to get root access (why? because the syscall interface is really hard to secure, as a matter of practice.) --arms race: --defenders mark stack as non-executable (if hardware provides a way to do that). response: return-to-libc [DRAW PICTURE] --defenders create W ^ X policy (see below) so that memory cannot be both writable and executable. response: return-oriented programming (ROP) [DRAW PICTURE] smash the stack with a bunch of return addresses. each return address points to the needed instruction followed by "ret" (requires the attacker to have previously identified these instructions in the code, so the assumption is that the attacker has access to the source code or binarhy). not too hard in CISC code like on x86, where there are lots of sequences of code embedded in the binary, even sequences that the programmer didn't mean (because instructions are not fixed length). result: the control flow bounces around all of these byte sequences in memory, executing exactly what the attacker wanted, but not executing off of the stack. defending against ROP is hard (though if people use only safe languages, that is, languages that do bounds checking and other pointer checks, such attacks will be much, much harder) --ROP requires access to the source or binary, so maybe we can just make sure that binaries don't fall into the hands of attackers? --Well, no that doesn't work either. A recent technique, *Blind* Return-Oriented Programming (BROP), shows how to conduct attacks even when the binary isn't available and even on 64-bit machines. References: http://www.scs.stanford.edu/brop/ http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6956567 http://www.scs.stanford.edu/~sorbo/brop/bittau-brop.pdf --other attacks: --overwriting function pointers --smashing the heap --what is W ^ X? map the stack pages as non-executable, if the hardware allows it. But there are some issues.... --the original 386 did not allow it with page tables. However, all x86 chips that support extended page tables (which are used to help users get at >4GB of physical memory even if the machine is 32 bits) also support an XD bit in those page tables, which means "don't execute code in this page". We haven't worked with this bit in this class, but the architecture on modern 32-bit x86 supports it. --Even on x86s that don't suport extended page tables, segmentation would help with do-not-execute (since the permissions in the segment descriptor can express this). The disadvantage here is that the compiler needs to lay out the code and stack to match what the segments would require. --The bummer with W ^ X, even when it *is* supported, is this: some languages not only don't need it but also are actively harmed by W ^ X. The core of the issue is that a program written in a safe language (Perl, Python, Java, etc.) does not need W ^ X whereas lots of C programs do. Meanwhile some machines *always* enforce W ^ X, even for programs that do not need it. Such enforcement constrains certain languages, namely those that need to do runtime code generation. --what about the defense of address space layout randomization (ASLR)? This provides some help but obviously doesn't help our vulnerable server because our server tells the client where the buffer is. --And on 32-bit systems, the randomness can be defeated through brute force. There's only 20 bits of randomness conceivable (the VPN bits), and ASLR implementations left the top four alone, to avoid fragmenting VM. --On 64-bit systems, this defense can be defeated, if the server simply reforks children instead of restarting. See the BROP paper (referenced above) --what about the defense of canary values near the return address? StackGuard (in gcc), PaX, etc. --This can also be defeated; see again the BROP references. --Another defense: don't use C! CPUs are so fast that a language with bounds checking probably isn't going to pay a huge performance penalty relative to one without bounds checks --Question: can we instead confine processes and users so that when they're broken into, the damage is limited?