Overview
Throughout this class, you will build a simple, yet scalable web
server. The goal in life of such a server is to respond to HTTP
requests. HTTP is a human-readable protocol that your browser uses to
tell a server which document you'd like to retrieve.
For instance, If you typed in your browser:
http://cs.nyu.edu/index.htmlIt would issue a HTTP request that would look like this (in a simplified form):
GET /index.html HTTP/1.1 Host: cs.nyu.edu User-Agent: Mozilla/5.0The NYU server would respond with (again, simplified):
HTTP/1.1 200 OK Date: Tue, 12 Jan 2011 15:23:51 GMT Content-Length: 14459 Content-Type: text/html <html> ... remainder of the document ...
And finally your browser would process the HTML. (If you're curious to
see this working, you could try using 'firebug' in the tools
section.)
We are interested in a HTTP server because it is arguably easy to
parallelize. Requests are mostly independent, right? Not always. Take
HTML document caching, for instance. It imposes some coordination that may
or may not scale. We'll expose and study these scalability problems.
In the process, we will also discuss the tools we use to program and
debug such parallel code. There is a growing consensus that we need
better tools. We will see why.
The practical assignments will follow the steps below.
Collaboration
Please, take a look at the department's
academic integrity policy. You can show a colleague how to use a
given tool. You can discuss strategies to solve the problems with a
colleague, if you mention it in writing in your assigment handin.
This kind of collaboration is encouraged. However, each student is
expected to type, compile, debug, and benchmark their own code. You
are not allowed to look at a colleague's code.
Development environment and tools
We are going to be doing all our development and benchmarking in Linux.
For convenience, a development virtual
machine is available that has all the packages we need
pre-installed. (username with admin privileges is dev, passwd:
dev). It is a Ubuntu 9.10 distribution, which is a very user-friendly
distribution. Use the VM
Player to execute the VM in your laptop, for intance. If you
already have a Linux machine you'd prefer to use, please make sure to
install all the necessary packages.
Please, take some time to get comfortable with the
tools. The initial labs will allow that time but you should be
proactive. By the lab 3, when the true fun starts, you will want to
focus on "things multicore" not on underlying tools.
The tools in question are:
GCC and GDB are the
compiler and debugger we will be using. The latter in particular is
your new best friend. If you never used it, start by this simple and
excellent
tutorial.
Git is the tool we will
use to manage our source code. Because a lab builds on the previous
one, we want to keep track of the code changes we perform. Git will
also be the tool I'll use to distribute code, when needed. Here is a
primer to
git and a very
nice cheat
sheet.
Libraries such as the
STL, pthreads, and sockets are going to be used quite some. The course
assumes you know the first one and will introduce the others as
needed. A very accessible guide to programming with threads
is Blaise
Barney's. About socket programming, do not worry if you haven't
seen it. Networking code is not a pre-requisite and will be given to
you. I would expect you to feel a bit curious about it anyway, in
which case I'd point to Steven's et al "Unix Network Programming"
(from our book reserve). In more than one lab, we will be using
code almost straight from the book.
Code Profiling will be
done in almost all the labs. There are two tools that will amaze you,
I'm sure. One is the
Google's Performance
Tools. It gives us a glimpse of where time is being spent inside
our running code without disturbing it much. The other tool is a
browser plug-in
called Firebug. This tool adds a
network activity monitor to your browser that allows you to profile
and inspect HTTP requests and responses.
Load Generator is a tool
that can create artificial HTTP requests that will hit our
server. We'll be using a generator
called HTTPerf. I
understand you'd be thrilled by how efficient your server will handle
these requests. But when running the generator on a shared machine,
please be mindful of others. (Or we might get kicked out of the more
powerful machines...)
Our mailing
list g22_2631_001_sp11
can be a great source of help. Don't be ashamed to use it.