CSCI-UA.0480 Spring 2016 Lab 2: Buffer Overflows

Released Tuesday, February 9, 2016
Part 1 due Wednesday, February 17, 2016, 10:00 PM
Parts 2 and 3 due Wednesday, February 24, 2016, 10:00 PM

Lab 2: Buffer Overflows

Introduction

This lab is the first of a sequence that will give you practical experience with common attacks and counter measures. To make the issues concrete, you will explore the attacks and counter measures in the context of the zoobar web application.

These labs adapt assignments from MIT's 6.858, which extend assignments developed in Stanford's CS155. Much of the description below is borrowed from 6.858.

This lab will introduce you to buffer overflow vulnerabilities, in the context of a web server called zookws. The zookws web server is running a simple python web application, zoobar, where users can transfer "zoobars" (credits) between one another. You will find buffer overflows in the zookws web server code, write exploits for the buffer overflows to inject code into the server, and figure out how to bypass stack canaries and non-executable stack protection (for extra credit). Later labs look at other security aspects of the zoobar and zookws infrastructure.

Each lab requires you to use a new programming language or some other piece of infrastructure. For example, in this lab you must acquaint yourself with certain aspects of the C language, x86 assembly language, and gdb. The infrastructure in these labs is "overhead" that stems from the goal of having you understand attacks and defenses in realistic situations. Furthermore, you often need to understand this new infrastructure in detail. This is because security weaknesses often show up in corner cases, so you need to understand those corner cases to craft exploits and design defenses.

These two factors (new infrastructure and details) can make the labs time consuming. We have two pieces of advice here. First, work on the labs daily (or at least regularly) for a limited time, instead of trying to do all exercises in a single shot before the deadline. Second, try hard to understand the necessary details, instead of muddling your way through the exercises; it will make the exercises go much faster. If you get stuck on a detail, post a question on Piazza.

Some notes:

Set up and infrastructure

This lab will be performed on the virtual devbox you set up in lab 0. Explicit instructions on how to start the devbox can be found here. As before, we suggest you use ssh to connect to the virtual machine once it has been started.

The instructions below assume that your lab materials will be stored in ~/labs. You can place them elsewhere (none of our scripts explicitly on this absolute path); if you take that option, you will need to modify the sample commands.

To get the lab materials, do the following from the terminal of your virtual machine:

$ mkdir ~/labs
$ cd ~/labs
# For the following, replace <user> with the github username
# you submitted in lab 0.
$ git clone https://github.com/nyu-cs480-16sp/<user>-labs.git .
Cloning into '.'...
Username for 'https://github.com': <user>
Password for 'https://<user>@github.com': 
remote: Counting objects: 133, done.
remote: Total 133 (delta 0), reused 0 (delta 0), pack-reused 133
Receiving objects: 100% (133/133), 79.33 KiB | 0 bytes/s, done.
Resolving deltas: 100% (48/48), done.
Checking connectivity... done.
$ git checkout lab2
$ _

Before you proceed with this lab assignment, make sure you can compile the zookws web server:

$ cd lab2
$ make
cc zookld.c -c -o zookld.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc http.c -c -o http.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc -m32  zookld.o http.o  -lcrypto -o zookld
cc zookfs.c -c -o zookfs.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc -m32  zookfs.o http.o  -lcrypto -o zookfs
cc zookd.c -c -o zookd.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc -m32  zookd.o http.o  -lcrypto -o zookd
cp zookfs zookfs-exstack
execstack -s zookfs-exstack
cp zookd zookd-exstack
execstack -s zookd-exstack
cp zookfs zookfs-nxstack
cp zookd zookd-nxstack
cc zookfs.c -c -o zookfs-withssp.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE
cc http.c -c -o http-withssp.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE
cc -m32  zookfs-withssp.o http-withssp.o  -lcrypto -o zookfs-withssp
cc zookd.c -c -o zookd-withssp.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE
cc -m32  zookd-withssp.o http-withssp.o  -lcrypto -o zookd-withssp
cc -m32   -c -o shellcode.o shellcode.S
objcopy -S -O binary -j .text shellcode.o shellcode.bin
cc run-shellcode.c -c -o run-shellcode.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc -m32  run-shellcode.o  -lcrypto -o run-shellcode
rm shellcode.o
$ _

The zookws web server consists of the following components.

After zookld launches configured services, zookd listens on a port (8080 by default) for incoming HTTP requests and reads the first line of each request for dispatching. In this lab, zookd is configured to dispatch every request to the zookfs service, which reads the rest of the request and generates a response from the requested file. Most HTTP-related code is in http.c.

You can run the web server using one of three configurations:

The *-exstack binaries have an executable stack, which makes it easier to inject executable code, given a stack buffer overflow vulnerability. The *-nxstack binaries have a non-executable stack, which is more challenging to exploit (exploiting these binaries will be extra credit). Last, the *-withssp binaries use stack canaries (and have a non-executable stack); this adds yet another challenge, and exploiting this setup will be further extra credit.

To run the web server in a predictable fashion – so that its stack and memory layout is the same every time – you will use the clean-env.sh script. This is the same way we will run the web server during grading, so please make sure all of your exploits work for this configuration.

We have provided reference binaries of zookws in bin.tar.gz. We will use them for grading, so make sure your exploits work on them. (This might be of concern if/when you modify code during later parts of the lab.)

Make sure you can run the zookws web server and gain access to the zoobar web application from a browser running on your machine, as follows. First, start the web server, and second, use /sbin/ifconfig/ to get the IP address of the virtual machine:

$ ./clean-env.sh ./zookld zook-exstack.conf
$ /sbin/ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 08:00:27:19:99:1a  
          inet addr:192.168.56.101  Bcast:192.168.56.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe19:991a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:757 errors:0 dropped:0 overruns:0 frame:0
          TX packets:592 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:65439 (65.4 KB)  TX bytes:77035 (77.0 KB)

Third, open your browser and point it to a URL that you construct from the IP address. In this example, it would be http://192.168.56.101:8080/. If something doesn't seem to be working, try to figure out what went wrong or contact the course staff before proceeding further.

Part 1: Finding Buffer Overflows

In the first part of this lab assignment, you will find buffer overflows in the provided web server.

Exercise 0. If you have not already, read in detail Aleph One's article, Smashing the Stack for Fun and Profit, to learn (a) how to identify buffer overflows (this is the important part for this week) and (b) how to exploit them (which you need for the next week's part). A thorough understanding of this article will be critical in this lab.

Exercise 1. Study all of the zoobar code, and find examples of code vulnerable to memory corruption through a buffer overflow. Write down a description of each vulnerability in the answers.txt file. For each vulnerability, describe the buffer which may overflow, and how you would structure the input to the web server (i.e., the HTTP request) to overflow the buffer. Locate at least 5 different vulnerabilities.

Now, you will start developing exploits to take advantage of the buffer overflows you have found above. We have provided template Python code for an exploit in ~/labs/lab2/exploit-template.py, which issues an HTTP request. The exploit template takes two arguments, the server name and port number, so you might run it as follows to issue a request to zookws running on localhost. (The caption runs the server and the exploit in one terminal. However, we suggest using two terminals side-by-side, and in that case, you do not need the '&' in the first command.)

# Start the server
$ ./clean-env.sh ./zookld zook-exstack.conf &
[1] 2676
# Issue the request via the 'exploit-template' script
$ ./exploit-template.py localhost 8080
HTTP request:
GET / HTTP/1.0

...
$

The word localhost is a hostname that means this host. It is effectively the "address" of the machine on which it is used.

You are free to use this template, or write your own exploit code from scratch. Note, however, that if you choose to write your own exploit, the exploit must run correctly inside the provided virtual machine.

You will find gdb useful in building your exploits. As zookws forks off many processes, it can be difficult to debug the correct one. The easiest way to do this is to run the web server ahead of time with clean-env.sh and then attach gdb to an already-running process with the -p flag. To help find the right process for debugging, zookld prints out the process IDs of the child processes that it spawns. You can also find the PID of a process by using pgrep; for example, to attach to zookd-exstack, start the server and, in another shell, run

# While the server is running ...
$ gdb -p $(pgrep zookd-exstack)
...
0x4001d422 in __kernel_vsyscall ()
(gdb) break <your-breakpoint>
Breakpoint 1 at 0x1234567: file zookd.c, line 999.
(gdb) continue
Continuing.

Keep in mind that a process being debugged by gdb will not get killed even if you terminate the parent zookld process using ^C. If you are having trouble restarting the web server, check for leftover processes from the previous run, or be sure to exit gdb before restarting zookld.

When a process being debugged by gdb forks, by default gdb continues to debug the parent process and does not attach to the child. Since zookfs forks a child process to service each request, you may find it helpful to have gdb attach to the child on fork, using the command set follow-fork-mode child. We have included a .gdbinit file that automatically sets this option for you whenever you run gdb from the ~/labs/lab2 directory. You may wish to add other commands to the file or move/copy the file.

For this and subsequent exercises, you may need to encode your attack payload in different ways, depending on which vulnerability you are exploiting. In some cases, you may need to make sure that your attack payload is URL-encoded; that is, use + instead of space and %2b instead of +. Here is a URL encoding reference. You can also use quoting functions in the python urllib module to URL encode strings. In some cases, you may need to include binary values into your payload. The Python struct module can help you do that. For example, struct.pack("<I", x) will produce a 4-byte (32-bit) binary encoding of the integer x.

Exercise 2. Pick two buffer overflows from Exercise 1. One of them must overwrite a return address on the stack. The other must overwrite some other data structure that can be exploited to subvert the control flow of the program.

Write exploits that trigger them. Trigger here does not mean injecting code (yet). Your only job in this exercise is to overwrite memory (with junk), consistent with the requirements immediately above. Verify that your exploit actually corrupts memory by either checking the last few lines of dmesg | tail, using gdb, or observing that the web server crashes.

Provide the code for the exploits in files called exploit-2a.py and exploit-2b.py, and indicate in answers.txt which buffer overflow each exploit triggers.

Note that some of the vulnerabilities in Exercise 1 may be more or less easy to exploit. If you find yourself having difficulty exploiting a given vulnerability, choose a different one from Exercise 1; in the extreme case, back up, and modify (or enhance) your answer to Exercise 1.

Along the way, use gdb to set a breakpoint where you expect your exploit to be triggered. As you work, use gdb to step through the server's execution and explore the state of the process both before and after your exploit is triggered. You can use the backtrace command (bt) to print the stack frames line-by-line. By doing this, you confirm you're exploiting the vulnerability you think you are.

For exploit 2a, provide (in answers.txt) the results of two backtrace commands, each issued after the vulnerable line. One backtrace should be gathered when an innocuous request has been sent; the other backtrace should be gathered after exploit 2a has been sent. There should be a clear difference between the two traces: for the regular request, the stack should be "as you expect it", whereas for the malicious request, the stack should show that the return address has been tampered with.

For exploit 2b, use other gdb commands as appropriate (info reg, x, print, etc.) to demonstrate that an exploit has had the effect that you have predicted. You should again provide in answers.txt two "traces" (which could be memory or register dumps) after the vulnerable line: one when an innocuous request has been sent, and one when exploit 2b has been sent.


You can check whether your exploits crash the server as follows:

$ cd ~/labs/lab2
$ make check-crash

Handing in the lab (Part 1)

Here is a checklist for before you submit:

Part 2: Code injection

In this part, you will use your buffer overflow exploits to inject code into the web server. In particular, one of your exploits will be used to link a sensitive file, ~/labs/secret/grades.txt, to a public location, ~/labs/lab2/zoobar, and the other will unlink the new file to remove evidence. (This models an attacker who uses an exploit to exfiltrate a sensitive file, and then another exploit to remove the evidence.) The link function can be used to create a link between an existing file and a new file path/name. Similarly, unlink removes the link to a file, effectively deleting it (assuming no other links to the file exist). See man 2 link and man 2 unlink for more information on these system calls.

You will use the *-exstack binaries, as their stack is executable, which makes it easier to inject code. The zookws web server should be started as follows:

$ cd ~/labs/lab2
$ ./clean-env.sh ./zookld zook-exstack.conf

You will develop each exploit in two steps: write shell code that performs the desired operation (link or unlink), then embed the compiled code into an HTTP request that triggers the buffer overflow on the server.

When writing shell code, it is often easier to use assembly language rather than other higher-level languages, such as C. This is because the exploit usually needs fine control over the stack layout, register values and code size; meanwhile, the C compiler generates additional function preludes and performs various optimizations, and these things make the compiled binary code unpredictable.

We have provided Aleph One's shell code for you to use in ~/labs/lab2/shellcode.S along with Makefile rules that produce ~/labs/lab2/shellcode.bin, a compiled version of the shell code. Aleph One's exploit is intended to exploit setuid-root binaries, and thus it runs a shell. You will need to modify this shell code to instead link/unlink.

To help you develop your shell code for this exercise, we have also provided a program called run-shellcode that will run your binary shell code, as if you correctly jumped to its starting point. For example, running it on Aleph One's shell code will cause the program to execve("/bin/sh"), thereby giving you another shell prompt:

$ ./run-shellcode shellcode.bin

When developing an exploit, you will have to think about what values are on the stack, so that you can modify them accordingly. For your reference, here is what the stack frame of some function foo looks like; here, foo has a local variable char buf[256]:

 STACK               +------------------+             MEMORY
                     |       ...        |                  
   |                 |  stack frame of  |               /|\
   |                 |   foo's caller   |                |
   |                 |       ...        |                |
   |                 +------------------+                |
   |                 |  return address  | (4 bytes)      |
   |                 | to foo's caller  |                |
   |                 +------------------+                |
   |    %ebp ------> |    saved %ebp    | (4 bytes)      |
   |                 +------------------+                |
   |                 |       ...        |                |
   |                 +------------------+                |
   |                 |     buf[255]     |                |
   |                 |       ...        |                |
  \|/    buf ------> |      buf[0]      |                |
                     +------------------+

Note that the stack grows down in this figure, and memory addresses are increasing up.

When you're constructing an exploit, you will often need to know the addresses of specific stack locations, or specific functions, in a particular program. The easiest way to do this is to use gdb. For example, suppose you want to know the stack address of the pn[] array in the http_serve function in zookfs-exstack, and the address of its saved %ebp register on the stack. You can obtain them using gdb as follows:

$ gdb -p $(pgrep zookfs-exstack)
...
0x40022416 in __kernel_vsyscall ()
(gdb) break http_serve
Breakpoint 1 at 0x8049415: file http.c, line 248.
(gdb) continue
Continuing.
Be sure to run gdb from the ~/lab directory, so that it picks up the
set follow-fork-mode child command from ~/lab/.gdbinit. Now you can issue an 
HTTP request to the web server, so that it triggers the breakpoint, and 
so that you can examine the stack of http_serve:

[New process 1339]
[Switching to process 1339]

Breakpoint 1, http_serve (fd=3, name=0x8051014 "/") at http.c:248
248     void (*handler)(int, const char *) = http_serve_none;
(gdb) print &pn
$1 = (char (*)[1024]) 0xbfffd10c
(gdb) info registers
eax            0x3  3
ecx            0x400bdec0 1074519744
edx            0x6c6d74 7105908
ebx            0x804a38e  134521742
esp            0xbfffd0a0 0xbfffd0a0
ebp            0xbfffd518 0xbfffd518
esi            0x0  0
edi            0x0  0
eip            0x8049415  0x8049415 
eflags         0x200286 [ PF SF IF ID ]
cs             0x73 115
ss             0x7b 123
ds             0x7b 123
es             0x7b 123
fs             0x0  0
gs             0x33 51
(gdb)

From this, you can tell that, at least for this invocation of http_serve, the pn[] buffer on the stack lives at address 0xbfffd10c, and the value of %ebp (which points at the saved %ebp on the stack) is 0xbfffd518.

Even though the attacker that we are modeling would first mount the link exploit, followed by the unlink exploit, the exercises below ask you to develop the exploits in the opposite order. This is because unlink is likely to be a bit easier.

Exercise 3. Modify shellcode.S to unlink ~/labs/lab2/zoobar/grades.txt. Your assembly code can either invoke the SYS_unlink system call, or call the unlink() library function.

To test this exercise, you'll need to first ensure the grades.txt file already exists before each run. You can do this either by:
# Manually copying the file
$ cp ~/labs/secret/grades.txt ~/labs/lab2/zoobar/
# or touching the latter location
$ touch ~/labs/lab2/zoobar/grades.txt

You can use the following to test your shellcode:

$ cd ~/labs/lab2
$ make
....
$ touch ~/zoobar/grades.txt
$ ./run-shellcode shellcode.bin
# Make sure ~/zoobar/grades.txt is gone...
$ ls ~/zoobar/grades.txt
ls: cannot access .... No such file or directory

Save the shellcode you wrote as shellcode-3.S.


Exercise 4. Modify the shellcode to link the ~/labs/secret/grades.txt file to ~/labs/lab2/zoobar/grades.txt. Save the code you wrote as shellcode-4.S.

Recall that the link(source, target) system call creates a link at target to the file at source. It does this by creating a new directory entry for target and pointing it at the same inode as given by source.

For more information, see the references to hard linking in lab 6 of NYU's 202 course, or do man 2 link.

Now you will construct a malicious HTTP request that injects the compiled byte code to the web server to hijack its control flow and run the injected code. Suggestion: first focus on obtaining control of the program counter. Sketch out the stack layout that you expect the program to have at the point when you overflow the buffer, and use gdb to verify that your overflow data ends up where you expect it to. Step through the execution of the function to the return instruction to make sure you can control what address the program returns to. The next, stepi, info reg, and disassemble commands in gdb should prove helpful. Once you can reliably hijack the control flow of the program, find a suitable address that will contain the code you want to execute, and focus on placing the correct code at that address.

Exercise 5. Starting from the return address vulnerability you found in Exercise 2, construct an exploit that hijacks control flow of the web server and executes the shellcode you wrote in Exercise 3. Save the exploit in a file called exploit-5.py.

Hint: Some HTTP requests might cause the function you're exploiting to return early. If necessary, sanitize the attack payload (e.g. using urllib.quote) to make sure the control flow can reach the vulnerable point that you intended to exploit.

Exercise 6. Using the other vulnerability (that you found in Exercise 2), create another exploit to hijack the web server and execute the shellcode you wrote in Exercise 4. Save the exploit in a file called exploit-6.py.

Many modern operating systems mark the stack non-executable in an attempt to make it more difficult to exploit buffer overflows. In this part, you will explore how this protection mechanism can be circumvented. Run the web server configured with binaries that have a non-executable stack, as follows.

$ ./clean-env.sh ./zookld zook-nxstack.conf

The key observation to exploiting buffer overflows with a non-executable stack is that you still control the program counter, after a RET instruction jumps to an address that you placed on the stack. Even though you cannot jump to the address of the overflowed buffer (it will not be executable), there's usually enough code in the vulnerable server's address space to perform the operation you want.

Thus, to bypass a non-executable stack, you need to first find the code you want to execute. This is often a function in the standard library, called libc, such as execl, system, or unlink. Then, you need to arrange for the stack to look like a call to that function with the desired arguments, such as system("/bin/sh"). Finally, you need to arrange for the RET instruction to jump to the function you found in the first step. This attack is often called a return-to-libc attack. This article contains a more detailed description of this style of attack.

In the next exercise, you will need to understand the calling convention for C functions. For your reference, consider the following simple C program:

void
foo(int x, char *msg, int y)
{
  /* ... */
}

void
bar(void)
{
  int a = 3;
  foo(5, "Hello, world!", 7);
}

The stack layout when bar invokes foo, just after the program counter has switched to the beginning of foo, looks like this:

                     +------------------+
     %ebp ------>    |    saved %ebp    | (4 bytes)
                     +------------------+
                     |       ...        |
                     +------------------+
  bar's a ------>    |        3         | (4 bytes)
                     +------------------+
                     |       ...        |
                     +------------------+
                     |        7         | (4 bytes)
                     +------------------+
                     |    pointer to    | ------>  "Hello, world!", somewhere in memory
                     |      string      | (4 bytes)
                     +------------------+
                     |        5         | (4 bytes)
                     +------------------+
                     |  return address  | (4 bytes)
        %esp ------> |     into bar     |
                     +------------------+
                     |                  |

When foo starts running, the first thing it will do is save the %ebp register on the stack, and set the %ebp register to point at this saved value on the stack, so the stack frame will look like the one shown before.

Exercise 7 (Extra credit). Create an exploit that works even if the stack is non-executable (NX). Your exploit should target the "return address" vulnerability that you chose in Exercise 2, and it should unlink /home/httpd/grades.txt.

Although in principle you could use injected shellcode that's not located on the stack, for this exercise you should not inject any shellcode into the vulnerable process. You should use a return-to-libc (or at least a call-to-libc) attack where you vector control flow directly into code that existed before your attack. Recall that you can configure zookws to use a non-executable stack by doing:
$ ./clean-env.sh ./zookld zook-nxstack.conf
Save your exploit in a file named exploit-7.py.

Exercise 8 (Extra credit). Create an exploit that works even if stack canaries are enabled and the stack is non-executable. Your exploit can target a vulnerability of your choosing, and it should again unlink /home/httpd/grades.txt. Recall that the following runs zookws with stack canaries:
$ ./clean-env.sh ./zookld zook-withssp.conf
Save your exploit in a file named exploit-8.py.

Part 3: Fixing buffer overflows

Now that you have used the buffer overflow attack, you will try to fix the server's code to disallow their use.

Exercise 9. For each buffer overflow vulnerability you have found in Exercise 1, fix the web server's code to prevent the vulnerability in the first place. Do not rely on compile-time or runtime mechanisms such as stack canaries, removing -fno-stack-protector, baggy bounds checking, etc.

Handing in the lab (Part 2)

Here is a checklist for before you submit:

This completes the lab.

Last updated: 2016-04-15 16:24:03 -0400 [validate xhtml]