CSCI-UA.0480 Spring 2016 Lab 1: Network servers

Released Wednesday, January 27, 2016
Parts 0 and 1 due Wednesday, February 3, 2016, 10:00 PM
Parts 2 and 3 due Wednesday, February 10, 2016, 10:00 PM

Lab 1: Network servers

Introduction

In this lab, you will first improve (or reinforce) your shell-using skills; these skills will help you be productive in this class. In the heart of the lab, you will learn the basics of writing application-level networking code. The end-product will be a a Web server, written in C, that will respond to Web browsers. Along the way, you will gain practice with the C programming language; this too will be useful for some of the future labs.

You will electronically hand in code and answers (the mechanics are described at the end of the lab; sneak preview).

Answers are required to the numbered Questions throughout the document and go in a file called answers.txt. Your answers must be in this file in text; a blank answers.txt will receive no credit.

Please note the following expectation:

Getting Started

You will do this lab on the CIMS machines. However, this lab uses standard interfaces and tools; as a consequence, you ought to be able to do the lab on OS X or even cygwin on Windows (though we have not tested this), or inside our virtual devbox. However, the instructions will be relative to the CIMS machines, and we will grade on those.

This lab assumes intermediate-level C abilities. If you're rusty, take a bit of time to work through the following exercises from last spring's CS202 course:

You will run your network servers on either linserv1.cims.nyu.edu or linserv2.cims.nyu.edu. Do not run your server on one of the other cims machines; the ports to those other machines are blocked, which will prevent external clients from connecting to them. To gain access to linserv1 and linserv2 machines, you first connect through access.cims.nyu.edu, as follows:

$ ssh cims
#  then do one of ...
$ ssh -X linserv1.cims.nyu.edu
#  or
$ ssh -X linserv2.cims.nyu.edu
Or, in one line:
$ ssh [-A] -X -tt access.cims.nyu.edu ssh -tt linserv1.cims.nyu.edu

To get the lab materials, do the following. If you have configured your system to forward X11, two boxes will pop open asking for your github username and password.

The instructions below assume that your lab materials are stored in ~/cs480-008. If you place them elsewhere, remember to modify the instructions.

$ mkdir ~/cs480-008
$ cd ~/cs480-008
# For the following, replace <username> with the github username
# you submitted in lab0.
$ git clone https://github.com/nyu-cs480-16sp/<username>-labs.git .
Cloning into '.'...
warning: You appear to have cloned an empty repository.
$ _

If you have not configured X11 forwarding, the cloning process might instead look something like:

$ git clone https://github.com/nyu-cs480-16sp/<username>-labs.git .
Cloning into '.'...
error: unable to read askpass response from '/usr/libexec/openssh/gnome-ssh-askpass'
Username for 'https://github.com': <username>
error: unable to read askpass response from '/usr/libexec/openssh/gnome-ssh-askpass'
Password for 'https://<username>@github.com':
warning: You appear to have cloned an empty repository.
$ _

At this point, you should have the source for this lab. You can find it at ~/cs480-008/lab1 by doing:

$ cd ~/cs480-008/lab1
$ ls
client.c Makefile ...

The rest of this page will assume you are working from within this directory.

Part 0: Review of man pages and shell

Don't forget the "Working environment" section of the setup page! You can open multiple windows at any time to make your coding environment more zen.

Before coding anything, we'll quickly review the shell. As you may have learned in CS202, the shell is a program that runs commands given to the computer (often typed by the user into a terminal). You gain access to the shell either by sshing to the computer or by opening up a terminal window directly at the machine. (Tutorials on ssh are described in the setup page.)

When you ssh to a machine, you have immediate access to the shell. Otherwise, you can open a terminal. (On the RHEL machines in the computer labs, you can open a terminal by going to the "Applications" menu, then finding "Konsole" in the "System Tools" submenu. Many Linux machines also bind the keyboard shortcut [Ctrl]+[Alt]+t to the "open terminal" command.)

While the shell is a powerful tool, it is opaque on first glance. The man (short for manual) pages can be a crucial resource. Man pages provide documentation on system call APIs (read(), write(), etc.), installed programs (ls, xargs, etc.), as well as general concepts.

To use the man pages to learn about my_program, you would type

$ man my_program
at the terminal; this brings up a program that displays manual pages. You can use the arrow keys to navigate line-by-line or use the f and b keys for page-wise navigation. Pressing h at any time brings up the help menu, which summarizes all commands available. (You may note that the help page is actually a summary of less commands. This is because the man pages behave in a similar way to the less program, which can be used to read text files.)

Unfortunately, the man pages are not perfect. Sometimes, they can be poorly written, hard to read, or even incomplete. If you find yourself confused after having made a good effort, please ask us for help.

Despite their faults, the man pages are a surprisingly good source of documentation of the system call API. Over the course of this class, we expect that you will repeatedly refer to the man pages. (The man pages can be used to learn more about themselves as well! Type man man, and avoid confusion.)

For system calls, you will generally need to tell man what section of the manual pages you are interested in (otherwise you may not get the documentation you're looking for). For example:

$ man write     # this isn't the documentation you're looking for
$ man 2 write   # good
In general, when reading documentation on system calls, you want the second section of the manual pages, which you read as above: man 2 [syscall]

The following exercises have been constructed around the commands that we found useful while doing the early labs in this course. It is not a comprehensive list. Along the way, you may find this reference helpful.

For each of the following exercises, include your answers in the answers.txt file.

Question 1. What do the following commands do? (Hint: go back here and search for '2>'.)
$ ./myprog > a.file
$ ./myprog 2> a.file

Conceivably, one could write

$ ./myprog > a.file 2> a.file
and bash has a convenient shorthand for this
$ ./myprog &> a.file

Question 2. Suppose that I want to both redirect stdout to the screen and also save it to a file named "output.file". How should I go about modifying the code from Exercise 1? Hint: man tee

Question 3. Briefly, what does the following command do? (You can think of /dev/null as an always-empty file — no matter how much you write to it, it remains empty.)
$ ( stat [/path/to/directory] &> /dev/null && echo "Success" ) || echo "Failure"

Download some old source code for openssl from https://www.openssl.org/source/old/1.0.0/openssl-1.0.0.tar.gz. You can do this using the command line tool wget by doing

$ mkdir shellpractice
$ cd shellpractice
$ wget <url>
to download whatever is at at <url> to the current directory. You can untar the file and enter the newly-created directory by doing:
$ tar -xf openssl-1.0.0.tar.gz && cd openssl-1.0.0
Notice that you can list the contents of openssl-1.0.0 with their sizes by typing
$ du -sh *
and further that you can list the contents by size in sorted order by stringing two commands together
$ du -s * | sort -nr 
The composite command above tells the shell to "pipe" the output of du into the sort command.


Exercise

The command grep (GNU Regular Expression Parser) can be used to search for regular expressions in inputted text or in file contents.

Using grep, find the file which has a line containing both "AES_cfb128_encrypt" and "ION".

Hint: man grep shows that doing

$ grep -r "regex" .
will search all files rooted at . for the regular expression "regex".


Question 4. What is the rough effect of running the following command on the file you found in the exercise immediately above? Don't worry about getting it exactly right. Credit will be awarded leniently.
$ sort [that file] | awk '{ print $2 }' | rev | head -n1019 | tail -n20

Some notes:

Part 1: Echo server + client

When two devices communicate over TCP/IP, one of them plays the role of "server" and the other plays the role of "client". For example, when you browse the Web, your browser is the client, and the machine that you're trying to reach (say the one hosting www.google.com) is the server.

In this section, we'll build a simple client and a simple server. The client will allow you to send a message to a remote host. The server will be an "echo server": it will accept connections and immediately echo the messages that it receives. After all is said and done, you should be able to open two terminals to run your two programs (client and server), and the behavior will be as follows:

# Terminal 1 -- runs the echo server (this is on linserv1 or linserv2)
$ ifconfig  # Find the server's IP address. We highlight it in red below.
eth0      Link encap:Ethernet  HWaddr 52:54:00:92:93:CF  
          inet addr:128.122.49.76  Bcast:128.122.49.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
....          

$ ./server 13142
Server running. Waiting for connection...
> Connection! Received "HELO". Echoing.
> Connection! Received "Network programming is fun!". Echoing.
> Connection! Received "Is there an echo in here?". Echoing.

# Terminal 2 -- runs the client (this is on any CIMS machine or your own computer)
$ ./client 128.122.49.76 13142 # we got the IP address from Terminal 1
Type something to send: HELO
Server said: HELO
$ ./client 128.122.49.76 3142
Type something to send: Network programming is fun!
Server said: Network programming is fun!
$ ./client 128.122.49.76 3142
Type something to send: Is there an echo in here?
Server said: Is there an echo in here?
$ _

Writing a client

At a high level, the client should do the following:

  1. Turn the server's name (for example, www.foobar.com) to an address (for example, IP address 69.89.31.56).
  2. Connect to the server
  3. Send a message (from the user) to the server
  4. Receive a message from the server and print to the terminal

To achieve this behavior, first you'll set up a connection. The following system calls will be needed along the way.

int socket(int domain, int type, int protocol);
Calls to socket(...) return a file descriptor or -1 in the event of failure. You'll need to pass three arguments: domain, type, and protocol, as follows: (Type man 7 ip for more information on the type and protocol fields for the AF_INET domain.)
struct hostent *gethostbyname(const char *name);

This call invokes the local Domain Name System (DNS) resolver. In other words, it maps from a name like www.foo.com to an IP address. Programmatically, calls to gethostbyname(...) return a pointer to a struct hostent or NULL in the event of failure. The argument is a char array containing the hostname of a remote host.

(Type man 3 gethostbyname for more information on this call.)

The definition of struct hostent is:

struct hostent {
  char    *h_name;        /* official name of host */
  char    **h_aliases;    /* alias list */
  int     h_addrtype;     /* host address type */
  int     h_length;       /* length of address */
  char    **h_addr_list;  /* list of addresses */
}

This structure contains all of the information needed to "identify" the server you wish to connect to. This is an intermediate data structure: you use it to fill out a struct sockaddr_in variable which is then in turn used to create a network connection. The definition of struct sockaddr_in is:

struct sockaddr_in  {
  unsigned short int sin_family; /* Address family */
  struct in_addr sin_addr;       /* IP address */
  unsigned short int sin_port;   /* Port number */
  unsigned char sin_zero[...];   /* Pad to size of 'struct sockaddr' */
};

# and also

struct in_addr {
  unsigned int s_addr; 
};

After setting the memory assigned to the struct to zero, you need to fill in the sin_family, sin_addr.s_addr, and sin_port elements. We have provided pseudocode below:

/* pseudocode */
struct sockaddr_in server_addr;
struct hostent *server = /* what goes here? */

memset all of server_addr to 0 /* hint: use sizeof */

set server_addr.sin_family to be AF_INET

memcpy FROM server->h_addr_list[0] TO serveraddr.sin_addr.s_addr 

server_addr.sin_port gets the user-specified port *in network order*

In the last step, notice that sin_port must be in network order. (Network order, for IP protocols, is bigendian; on x86 machines, like the ones we are using, host order is little endian. For a discussion of endianness issues, see here.) As is standard in network programming, we encapsulate the task of transforming from the host's representation to the network's representation in a function:

uint16_t htons(uint16_t hostshort);

whose only argument is a port number, represented as a uint16_t. (This call stands for "host to network, short". If the code were running on a big-endian machine, this function would do nothing at all. But you would still want to use the function so that your code would compile and run correctly on a little-endian machine.)

Exercise 1. In client.c, add code to the section under the comments marked "EXERCISE 1". Your code should create a socket and completely set up the server and server_addr data structures, following the instructions and pseudocode above. Don't forget to use htons when setting the port number. Make sure that the skeleton client continues to compile: type make client.

At this point, you're ready to actually make the connection. You will do so by using the connect(...) system call, given below:

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The system call takes three arguments: a socket file descriptor sockfd (you created this with your call to socket(...)), a pointer to a sockaddr struct *addr (you just set this up), and a number indicating the size of the sockaddr struct pointed to by the second argument addrlen (hint: sizeof(...)). The call returns 0 on success and -1 on failure.

Exercise 2. Add code to the section marked "EXERCISE 2". Connect to the server that was given by the command line arguments char * host and short port. Make sure the client continues to compile.

Exercise 3. Add code to the section under the comments marked "EXERCISE 3". First, your code should use send() to send the user-inputted message (char * buf) to the connection you opened. Then, it should call recv() to receive the server's response. Last, it should call write() to print the response to the screen.

N.B.: recv() does not guarantee to return, in a single call, all of the data that the sender sent. This is because the underlying implementation of TCP might not have all of the data, or the data might have been broken into multiple packets, etc. Therefore, when using recv() on a stream-oriented socket (as is the case here), one must call recv() in a loop, breaking only when it returns a non-positive value.

You can test your client against an echo server that we are running by typing the following. Every time you type something into your client and hit RETURN, your client should print out exactly what you sent. We will keep the echo server running for the duration of the lab, so feel free to test against it at any time.

$  make client
$ ./client fox.geekny.com 8675
Type something to send: HELLO COURSE STAFF
Server said: HELLO COURSE STAFF

Writing an echo server

Now that you have a working client, let's implement an echo server like the one running on fox.geekny.com port 8675.

At a high level, the echo server should do the following:

  1. Tell the OS it's ready to accept connections on a given port
  2. Accept a connection from a client
  3. Read data sent by that client
  4. Write data the received back to the client
  5. GOTO (2)

The first thing that the server needs to do is to create a socket. This time, multiple sockets will be in play: one will be permanently listening for new connections, and the others will be "spawned" by each accepted connection. At the start, you need to set up only the listening socket. This socket is called listen_fd in the code.

The following function will be useful:

uint32_t htonl(uint32_t hostlong);

This is just like htons(...) except that it ensures a number of type uint32_t is in network order (as opposed to uint16_t).

Once again, you'll need to set up a sockaddr_in structure, but this time the structure plays a different role: it expresses to the operating system where (in terms of the local IP address – a machine may have several – and the local port) the server wants to receive traffic. If the server is willing to accept traffic on all interfaces, it sets the address field in the structure to INADDR_ANY:

struct sockaddr_in server_addr;

...

server_addr.sin_addr.s_addr = htonl(INADDR_ANY);
int bind(int socket, const struct sockaddr *address, socklen_t address_len);

bind() takes a socket (you will have created this with a call to socket(...)), a struct sockaddr * (which you will need to fill out, being mindful of network vs host order), and a length (just as you encountered in connect(...)). Upon successfully associating the socket with a specified port, the function returns 0. Otherwise, -1 is returned.

int listen(int socket, int backlog);

This system call sets the passed-in socket to a mode where it can accept connections. The backlog variable indicates how many waiting connections to "hold" before rejecting clients' requests. On success, the function returns 0, otherwise it returns -1.

Exercise 4. In server.c add code to the section under the comments marked "EXERCISE 4". In particular, create the listening socket, bind it to an (address, port) pair, and set it to listen. Make sure that the skeleton client continues to compile: type make server.

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

This call accepts connections from sockfd and fills out client information in the other arguments. It returns a socket corresponding to the client's connection on success, otherwise it returns -1.

Exercise 5. Add code to the section under "EXERCISE 5". Here you'll want to set up the main loop. Because our server should be set up to handle connections from clients until the end of time, we want to make sure it doesn't exit after handling one request. So where should your call to accept() go? Add a corresponding call to close(); the argument should be the file descriptor returned by accept(). Make sure that the server continues to compile.

Exercise 6. Add code to the section under "EXERCISE 6". It's time to implement the echoing behavior. Echoing involves two steps: reading and writing. First, recv() from your newly created socket, then send() back what you've read. If you want, your server can print out the messages it receives to stdout. As always, be sure to do error checking!

Hint: As with your code for the client, you should call recv() until it returns a non-positive value (this loop is different from the main loop mentioned above).

By this point, you have written a compatible network client and network server. Test your programs by running your echo server on linserv1 or linserv2, and connecting to it with a client running somewhere else, as in the example invocation at the beginning of this section.

Now hand in your work, by following the instructions below.

Hand in

For this and all future labs, submitting is as easy as committing and pushing your changes up to github.

Here's a checklist before you submit:

This completes parts 0 and 1.

Part 2: Simple web server

In this next part of the lab, you'll be adding functionality to the echo server you've written to turn it into a fully functioning web server. At the end of this section, you should have a piece of code that (1) knows what "HTTP" is and knows how to interact using it and (2) can serve files in the web/ directory upon request. (This is, of course, in addition to the normal server tasks you implemented in the first part.)

Anatomy of an HTTP request

HTTP (Hypertext Transfer Protocol) is the basis for communication over the World Wide Web. We'll walk through the key details (a tutorial is here).

HTTP is built for the client-server model. The client is usually a Web browser; the server is known as a Web server. Two things happen in an HTTP exchange. (1) the client makes a request to the server for a resource ("please send me an HTML file"). (2) the server replies to the request ("here is your HTML file. have fun rendering it."). Each of the request and response have specific format that should be followed. This is part of the specification. Our server must understand this format, thus so should we! (This format is not to be confused with the format of Web pages; HTTP doesn't care too much about what is in Web pages.)

For a bit of exploration, we can use the tool Netcat (invoked with nc). In essence, Netcat allows you to read and write to network connections.

Log on to the linserv machines on the CIMS network and choose a port number >10000 and <49000. For these examples, we will assume you have chosen port 10001. It is possible that the port you choose will already be in use—perhaps by another student in this class. You'll know if this happens from an error message returned by the OS in bind(). If it happens, choose another port.

# Figure out the machine's IP address
$ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 52:54:00:92:93:CF  
          inet addr:128.122.49.76  Bcast:128.122.49.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20274640 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2650701 errors:0 dropped:0 overruns:0 carrier:0
....
# Set up netcat to listen on the local machine on the port of your choosing
$ nc -l 128.122.49.76 10001

The above tells netcat to listen for connections on port 10001 and print any text it receives to the screen. Thus, we can point our browser "at it" to see a live version of an HTTP request. Do this by typing the IP address of the machine followed by :10001 into your browser's address bar. For the above example, I would type: 128.122.49.76:10001. (Note! If no port is specified, your browser connects to port 80. Port 80 is the well-known port—this is a technical term—for HTTP.) In your netcat window, the contents of the request should appear.

....
$ nc -l 128.122.49.76 10001

GET / HTTP/1.1
Host: 128.122.49.76:10001
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.2,image/webp,*/*;q=0.9
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/530.11 (KHTML, like Gecko) Chrome/45.0.306.11 Safari/535.13
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8


For our purposes, the first line sent by the browser is the most interesting and we will only concern ourselves with implementing the behavior associated with it. (Real web servers have to implement all of RFC 2616, or, rather, its successors.)

The first line has the form [METHOD] [URI] HTTP/[VERSION]. For this particular instance, we see METHOD = GET, URI = /, and VERSION = 1.1. Our web server will only support the GET method and we'll assume that all our requests are HTTP/1.1. The URI (Uniform Resource Identifier) is a string that identifies the resource that the client is requesting. In this example, this string will be the "path" part of the URL (after the hostname), but a URI can also be a full URL.

# protocol first, in red
# hostname second, in blue
# uri last, in purple

https://www.google.com/webhp?ion=1&espv=2&ie=UTF-8#q=uniform+resource+identifier

The URI is often broken into two parts, separated by a '?'. Everything following the ? is the query string. We won't worry about it for this lab. Usually, the string before the ? indicates a resource location relative to some document root.

For our lab, the document root is at web/ within the lab1 directory. (Within that directory, you'll see four files.) This means that when the URI is /p1.html, your server will be expected to serve the file at web/p1.html, and so on.

Now that we have an idea of what a request looks like, let's have a look at the response. Use netcat to connect to a web server on the internet (www.example.com) to see the response that it gives.

# This time use netcat to open a connection with a web server, and type
# an HTTP request:
$ nc www.example.com 80
GET /HTTP/1.1

HTTP/1.1 400 Bad Request
Content-Type: text/html
Content-Length: 349
Connection: close
Date: Wed, 20 Jan 2016 10:19:00 GMT
Server: ECSF (lga/138B)

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>400 - Bad Request</title>
  </head>
  <body>
    <h1>400 - Bad Request</h1>
  </body>
</html>

Look at what happened above carefully — there is an (intentional) typo in the request. In the first line, there should be a space character between the first forward-slash and the letter 'H'. Upon receiving a malformed request, the server replies with status code 400 Bad Request. We'll keep this in mind for later. Let me send the full HTTP request (swapping out 128.122.49.76:10001 for www.example.com in the Host: field).

$ nc www.example.com 80
GET / HTTP/1.1
Host: www.example.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.2,image/webp,*/*;q=0.9
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/530.11 (KHTML, like Gecko) Chrome/45.0.306.11 Safari/535.13
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

HTTP/1.1 200 OK
Content-Encoding: gzip
Cache-Control: max-age=604800
Content-Type: text/html
Date: Wed, 20 Jan 2016 10:26:46 GMT
Expires: Wed, 27 Jan 2016 10:26:46 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (lga/13A4)
Vary: Accept-Encoding
X-Cache: HIT
x-ec-custom-error: 1
Content-Length: 606

  ;  R   TA  0  W r i]   S V1k  Z  $ 6  
 q     @+     l I I  s PzUe    Bf  '  + >   + OF
  I4h    ^@^    A  p @ M   u   j         * < 
|    P   P  -  6 O  $} Jl)  _, 4 y U rQazw r   t 
  s    3   z _      2 Mel   5     %  t     R   
  t3    :  | Q   ]    V-z  |  Y3* 
  ....

While the response succeeded (as indicated by the "200 OK"), we can't make sense of it! The response has been encoded using gzip ... look at the second line of the response. Re-issue the request but this time remove the Accept-Encoding: line.

# Send a modified request
$ nc www.example.com 80
GET / HTTP/1.1
Host: www.example.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.2,image/webp,*/*;q=0.9
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/530.11 (KHTML, like Gecko) Chrome/45.0.306.11 Safari/535.13
Accept-Language: en-US,en;q=0.8

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=604800
Content-Type: text/html
Date: Wed, 20 Jan 2016 10:31:45 GMT
Expires: Wed, 27 Jan 2016 10:31:45 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (lga/13A2)
Vary: Accept-Encoding
X-Cache: HIT
x-ec-custom-error: 1
Content-Length: 1270

<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;

    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 50px;
        background-color: #fff;
        border-radius: 1em;
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        body {
            background-color: #fff;
        }
        div {
            width: auto;
            margin: 0 auto;
            border-radius: 0;
            padding: 1em;
        }
    }
    </style>
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is established to be used for illustrative examples in documents. You may use this
    domain in examples without prior coordination or asking for permission.</p>
    <p><a href="http://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>

Much better! This time, we can see and read the entirety of the response. Of course, this response is intended for your browser, not you as a human. To see how your browser renders this content, point it to www.example.com.

Note a subtlety: the browser requested the URI "/". Web servers often interpret URIs ending in / as requests for directory index pages. When asked for www.example.com, which has URI=/, the server returns the equivalent of www.example.com/index.html. Most modern servers offer much configurability in this regard (.html, .php, building a live directory-listing page, ...) but merely implementing the index.html feature is sufficent for our purposes.

Your web server need not give as complex a response as example.com does. Instead, it will respond with only a one-line header (and the file that it is delivering).

You'll work through refitting the echo-server in the following sections:

  1. The server should attempt to parse each new request it receives.
    Overall, requests are terminated by a double return, corresponding to the characters \r\n\r\n. The difference between \r and \n goes back to the typewriter era. When pressing enter on a physical machine:
    • the write carriage returned to the left side of the paper (carriage return or \r)
    • the paper was advanced to the next line (new line or \n).
    The HTTP standard specifically identifies \r\n as a new line, so a "double return" means \r\n\r\n, as above.
  2. If the parsing fails, it should reply with the 400 error code we saw before.
  3. On a successful parse, it should check to see if the requested file exists.
  4. Next, it should send the response header.
  5. Last, it should send the file (if applicable).

Parsing the request

Recall the format of an HTTP request:

[METHOD] [REQUEST-URI] HTTP/[VERSION]
Key1: Value1
Key2: Value2
....
Exercise 7. Fill out parse_request(...). For now, we're only interested in the first line. We are expecting to receive specifically a GET method request to a valid page. Your task is to fill out the function. Make sure to read the comments. For now, you may assume that you have been passed all the data you will receive via the request argument. (You will ensure this later in Exercise 10.)

Your code should do exactly the following:

Remember the note above about index pages. We will expect your server to interpret any URI ending with a forward-slash / as ending with /index.html.

Hint: You may find the functions strsep(...) and strcmp(...) useful here. Try man 3 strsep and man 3 strcmp.

'Stat'ing resource information

Now that you know the uri the next thing you need to do is stat() the requested file to check that it exists. stat() is a system call. You can get a sense for what it does by invoking the stat program from the shell (this program is a wrapper around the stat() and lstat() system calls):

$ stat yourcool.file
  File: 'yourcool.file'
  Size: 2016            Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d      Inode: 9679        Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/   acsusr)   Gid: ( 1000/   acsusr)
Access: 2016-01-21 19:53:02.149851999 -0000
Modify: 2016-01-21 14:51:52.205851999 -0000
Change: 2016-01-21 14:51:52.205851999 -0000
 Birth: -
$ 
For documentation, see:
$ man 2 stat  # documentation on the stat() system call
$ man stat    # documentation on the stat program
$ man 1 stat  # long form of the preceding command
Exercise 8. Add your own code to stat the file you found in parse_request's uri. We did not specific a particular place for you to do this; add the logic to a sensible place in the code.

You will need to ensure that your server ultimately replies with an appropriate HTTP status code. The main case to handle is that if the requested file does not exist, the reply code should be 404 (if the requested file cannot be read due to permissions, you would want to reply with code 403, but we will not be strict in testing for this case). You may want to arrange for the appropriate reply codes in this exercise, or you may want to return to this requirement after looking over the next few exercises.

Hints:

Building/sending the response header

When a web server receives a request (however ill-formed), it is supposed to reply. You will implement this behavior with the function send_response_header(...). Recall the format of an HTTP response header:

HTTP/[VER] [CODE] [TEXT]
Key1: Value1
Key2: Value2
Exercise 9. Fill out the scaffolding for send_response_header(...). For this lab, you need only send the first line of the canonical response and can ignore the key-value pairs.

Notes:



Exercise 10. This is a good time to step back and test things. Add parse_request(...) and send_response_header(...) into your main program loop and see how things work. Be careful when invoking the parse_request(...) function as it expects its uri argument to be a buffer large enough to store the entire uri ... you may need to dynamically allocate memory.

In keeping with the contract that was codified in Exercise 7, you should be sure your code passes the full request/headers to the parse_request function. From before, recall that after calling recv(), we cannot be sure we have actually received the full message (request+headers) from the client. As noted above, you can be sure that the client's request will end in a double return "\r\n\r\n". Modify your original code to call recv() in a loop until you can be sure you have the full request: that it has ended appropriately. Only then should you call your parse_request function.

At this point, your server should be nearly functional. One thing remains: serving files.

Transferring the file

At this point, we already know if the file exists and can be read. If so, all that remains is to send it. For this task, you can use the sendfile(...) function. Read its man entry with man 2 sendfile. Note that sendfile(...) is more efficient than a call to read(...) followed by a call to write(...) because it can bypass userspace entirely. Regardless of this distinction, you can think of sendfile(...) as doing exactly that: a combination of read and write operations in quick succession.

Exercise 11. Fill out the code for send_file(...) and add it into the main loop just before you close the client socket. Make sure your control structures are set up correctly, e.g., you're not trying to send a file that doesn't exist. You will probably need to invoke the open() syscall. As usual: man 2 open.

Once you've sent the file, the connection is over and you can close the client socket.

To test your Web server, have some fun with it! Begin by running your server:

# assumes you're running on linserv2 (linserv1 also works)
$ cd ~/cs480-008/lab1            # go to the lab directory
$ make server                    # compile server if you have not
.... 
$ ./server 10001                 # start up server listening on port 10001
Then point the browser on your laptop, desktop, or phone at the server. In the example above, you could type http://linserv2.cims.nyu.edu:10001/index.html into the address bar on a smart phone (or you can click on the link just given, but unless your server is running on port 10001, you won't be connecting to that server...you might end up connecting to a classmate's server!). If that doesn't work (or even if it does), use the netcat tool to generate an HTTP request (or malformed request, or what-have-you), as you did for example.com; inspect the response.

Also, you can "manage your content": try placing some new HTML files in the web/ directory (or modifying the ones that are already there). Then point your browser to them. If your Web server works, your browser will fetch and then display those files.

Take it a step further: exchange information with your classmates about which port your server is running on. Point your laptop (or phone!) to their servers, and vice-versa.

This completes Part B.

Part 3: Multi-threaded web server (Extra credit)

Consider the following hypothetical: You've just founded a disruptive startup, and you want to serve the company's website using your new web sever. In two months' time, your web site will be garnering 500 views per second (this is less than one-tenth the number of tweets sent per second). Each view requires the server to fetch a file from disk. But a random access to the disk takes approximately 15 milliseconds, so your Web server can handle only 66 requests per second, on average. What we need to do is arrange for multiple requests to the disk to be in flight at once (and also to overlap the network processing and the disk processing). For this purpose, we will make the server multithreaded.

The approach we'll be using in this lab is to spawn a new thread (to a given maximum MAX_THREAD_COUNT) each time the server receives a request. This way all of the actual processing can be done in its own thread and the primary server need only listen for and accept new requests.

To simplify your task, we have supplied a simple thread package (written by Mike Dahlin) on top of the standard POSIX thread library (known as pthreads). The idea is to shield you from irrelevant details—this way, you use the standard package but you also focus on the project at hand. You are not required to use the wrapper, however: you may instead use pthreads if you so choose. The code for the simple thread package (which we will refer to hereafter as sthreads) is in sthread.c and sthread.h.

As you code, follow these standards, written by Mike Dahlin.

Because it is impossible to determine the correctness of a multithreaded programming via testing, grading on this part (it is extra credit) will primarily be based on reading your code. If your code is difficult to understand, then you will not receive as many points, even if the program seems to work.

We recommend creating a commit before attempting to implement multi-threading. Even better, create a new branch for your repository and implement multi-threading there.

Exercise 12 (Extra credit). Fill out the code for handle_client(). Use handle_client() as the start routine for the new threads.

Exercise 13 (Extra credit). Put the pieces together by making calls to sthread_create() and rewriting the main loop as appropriate. Make sure not to spawn more than MAX_THREAD_COUNT threads at once!

Handing in the lab

Here's a checklist before you submit:

This completes the lab.

Last updated: 2016-04-15 16:24:03 -0400 [validate xhtml]