CS372H Spring 2011 Homework 10 Solutions

Problem 1:

You are designing a program to automate the publication of stock data in the Greedy Stock Exchange system. To simplify the design, the structure of the application is such that the stock data file will be shared in read-only mode among the processes that are used by brokers, while the exchange manager will run a different process that can update the stock data. All processes will run on the MegaGiga mainframe system running the Solaris operating system. Using the system calls mmap(), mprotect(), and any other call you deem necessary, explain how the processes can share the stock data file as required. Show how the system calls will be used and explain the arguments that will be used to establish the solution.

Solution:
mmap() is used to remap physical memory and files into a process's virtual address space. It is widely used when linking shared libraries, since they don't have to be loaded into process' memory. mmap can also be used to map a file on the disk to the virtual address space of several processes. When a file is accessed by one of the processes, it will be loaded into the physical memory and it will not be loaded twice when other processes need it, since those processes can access the files through their virtual memory mappings.

mprotect controls access permissions to a section of memory.

Here is the detailed information about these functions copied from the man pages.

SYNOPSIS

#include <sys/mman.h>
void * mmap(void * start , size_t length, int prot , int flags , int fd , off_t offset );
int munmap(void * start , size_t length);

DESCRIPTION

The mmap function asks to map length bytes starting at offset offset from the file (or other object) specified by fd into memory, preferably at address start . This latter address is a hint only, and is usually specified as 0. The actual place where the object is mapped is returned by mmap.

The prot argument describes the desired memory protection. It has bits:

PROT_EXEC: Pages may be executed.
PROT_READ: Pages may be read.
PROT_WRITE: Pages may be written.
PROT_NONE: Pages may not be accessed.

The flags parameter specifies the type of the mapped object, mapping options and whether modifications made to the mapped copy of the page are private to the process or are to be shared with other references. It has bits

MAP_FIXED: Do not select a different address than the one specified. If the specified address cannot be used, mmap will fail. If MAP_FIXED is specified, start must be a multiple of the pagesize. Use of this option is discouraged.
MAP_SHARED: Share this mapping with all other processes that map this object.
MAP_PRIVATE: Create a private copy-on-write mapping.; You must specify exactly one of MAP_SHARED and MAP_PRIVATE.

The above three flags are described in POSIX.1b (formerly POSIX.4). Linux also knows about MAP_DENYWRITE, MAP_EXECUTABLE and MAP_ANON(YMOUS).

The munmap system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references.

RETURN VALUE

On success, mmap returns a pointer to the mapped area. On error, MAP_FAILED (-1) is returned, and errno is set appropriately. On success, munmap returns 0, on failure -1, and errno is set (probably to EINVAL).

SYNOPSIS

#include <sys/mman.h>
int mprotect(const void *addr, size_t len, int prot);

DESCRIPTION

mprotect controls how a section of memory may be accessed. If an access is disallowed by the protection given it, the program receives a SIGSEGV.

prot is a bitwise-or of the following values:

PROT_NONE: The memory cannot be accessed at all.
PROT_READ: The memory can be read.
PROT_WRITE: The memory can be written to.
PROT_EXEC: The memory can contain executing code.; The new protection replaces any existing protection. For example, if the memory had previously been marked PROT_READ, and mprotect is then called with PROT_WRITE, it will no longer be readable.

RETURN VALUE

On success, mprotect returns zero. On error, -1 is returned, and errno is set appropriately.

As seen from the man page descriptions, to be able to use mmap and mprotect we need to know a file descriptor and the length of file section we want to map starting from the offset. For our purposes, the offset from the file is going to be 0 and the length of the section we want to access is as large as the file itself. In order to get the file descriptor, we can use open() system call and, for finding out the length of the file we can use fstat(). You can find the detailed descriptions of these system calls in man pages. I'll give a brief summary of these system calls here:

#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h>
int open(const char * pathname , int flags );
open attempts to open a file and return a file descriptor (a small, non-negative integer for use in read , write , etc.)
flags is one of O_RDONLY , O_WRONLY or O_RDWR which request opening the file read-only, write-only or read/write, respectively.

#include <sys/stat.h>
#include <unistd.h>
fstat(int filedes , struct stat * buf );
stats the file pointed to by filedes and fills in buf

struct stat
{
    dev_t     st_dev;           /* device */
    ino_t     st_ino;           /* inode */
    mode_t    st_mode;          /* protection */
    nlink_t   st_nlink;         /* number of hard links */
    uid_t     st_uid;           /* user ID of owner */
    gid_t     st_gid;           /* group ID of owner */
    dev_t     st_rdev;          /* device type (if inode device) */
    off_t     st_size;          /* total size, in bytes */
    unsigned  long st_blksize;  /* blocksize for filesystem IO */
    unsigned  long st_blocks;   /* number of blocks allocated */
    time_t    st_atime;         /* time of last access */
    time_t    st_mtime;         /* time of last modification */
    time_t    st_ctime;         /* time of last change */
};

Now we can write pseudo code for the brokers and the manager assuming that mentioned file is Stock.data.

Broker() {
//open the file in read only mode and get the file descriptor
int fd = open( "Stock.data", O_RDONLY );
//get the information about the file
struct stat* this_file = new struct stat;
fstat( fd, this_file);
//now let's map the file into the process' virtual memory space
void* addr = (void*)mmap( Null, this_file->off_t,PROT_READ,MAP_SHARED, fd, 0);
/* DO SOMETHING*/
munmap(addr, this_file->off_t );
return;
}

Manager() {
//open the file in read only mode and get the file descriptor
int fd = open( "Stock.data", O_RDWR
//get the information about the file
struct stat* this_file = new struct stat;
stat( fd, this_file);
//now let's map the file into the process' virtual memory space
void* addr = mmap( Null, this_file->off_t,PROT_READ | PROT_WRITE,MAP_SHARED, fd, 0);
/* DO SOMETHING*/
/* WRITE*/
/* DO SOMETHING*/

munmap(addr, this_file->off_t );
return;
}

Problem 2

In some operating systems, IO from/to disk is done directly to/from a buffer in the user program's memory. The user program does a system call specifying the address and length of the buffer (the length must be a multiple of the disk record size).

The disk controller needs a physical memory address, not a virtual address. Ben Bitdiddle proposes that when the user does a write system call, the operating system should check that the user's virtual address is valid, translate it into a physical address, and pass that address and the length (also checked for validity) to the disk hardware.

This won't quite work. In no more than two sentences, what did Ben forget?

Solution:
Need solution

Problem 3:

Suppose a server workload consists of network clients sending 128-byte requests to a server which reads a random 50KB chunks from a server's file system and transmits that 50KB to the client. The server's file system is able to cache all metadata, so that each read consists of a single 50KB sequential read from a random location on disk. The server may have multiple disks and multiple network interfaces.

Each disk rotates at 10000 RPM and takes 5 ms on an average random seek. There are on average 300 sectors per track and each sector is 512 bytes (in actuality, the number of sectors per track will vary, but we'll ignore that. We'll also assume that each request is entirely contained in one track and that each starts at a random sector location on the track.)

To access disk, the CPU overhead is 30 microseconds to set up a disk access. The disk DMAs data directly to memory, so there is no CPU per-byte cost for disk accesses.

Each network interface has a bandwidth of 100 Mbits/s (that's Mbits not MBytes!) and there is a 4 millisecond one-way network latency between a client and the server. The network interface is full-duplex: it can send and receive at the same time at full bandwidth. The CPU has an overhead of 100 microseconds to send or receive a network packet. Additionally, there is a CPU overhead of .01 microseconds per byte sent.

How many requests per second can each disk satisfy?
Solution:
A 10,000 RPM disk takes 6ms per rotation and an average rotational delay is 1/2 rotation ——> 3ms avg rotational delay.
50KB is 100 * 512 byte sectors or 1/3 rotation to read the data (BW term is 2ms). So an average disk request takes
5ms seek + 3ms rotation + 2ms Bw = 10ms

So, a disk can support up to 100 requests per second.

(Note a slight simplifying assumption: we ignore the effect that a track buffer might have. A track buffer would sometimes overlap some of the rotation time and some of the BW time by letting the read begin at the "end" of the 50KB chunk, wait for the beginning part to come around, and be done when the beginning part has been read...)

How many requests per second can each network interface satisfy?
Solution:
Since the interface is full duplex, it will be limited by the send rate for the big packets (and the little incoming packets will not affect this). 50x2^10 bytes/request * 1 second/100X10^6 bits * 8 bits/byte = 4ms/request. So, a network interface can support up to 250 requests per second.

How many requests per second can the CPU satisfy (assuming the system has a sufficient number of disks and network interfaces?)
Solution:
The CPU overhead is (100 + .01 * 128)us to receive the request + 30 us to set up the disk access + (100 + .01 * 50x2^10)us to send the reply = 743 us/request. So, the CPU can handle 1345 requests per second.

What is the latency from when a client begins to send the request until it receives and processes the last byte of the reply (ignore any queuing delays).
Solution:

*L_nw: how to pipeline? Here, I assume the (unrealistic) best case scenario: the CPU starts processing the packet when the first byte arrives and finishes processing the last byte just as the last byte arrives. Other reasonable assumptions would be to not start processing the packet once it has entirely arrived. In reality, a large NW send (like 50KB) would be broken into smaller packets, each of which would be processed once it has entirely arrived to get pipelining, but each also would pay an overhead.
Each arrow shows a dependency in time (the tail must happen before the head).
Total time on critical path is
o_nw + b_nw(128) + L_nw+ o_disk + seek + rot + bw + o_nw + b+nw(50x2^10) + L_nw = 100x10-6 + 128 bytes * 1 second/100x10^6 bits * 8 bits/byte + 4x10-3 + 30x10-6 + 100x10-6 + 50x2^10 * 1 second/100x10^6 bits * 8 bits/byte = 8.336ms

Problem 4

Sun's network file system (NFS) protocol provides reliability via:

at-most-once semantics
at-least-once semantics
two-phase commit
transactions

Which is the best network on which to implement a remote-memory read that sends a 100 byte packet from machine A to machine B and then sends a 8000 byte packet from machine B to machine B?

A network with 200 microsecond overhead, 10 Mbyte/s bandwidth, 20 microsecond latency
A network with 20 microsecond overhead, 10 Mbyte/s bandwidth, 200 microsecond latency
A network with 20 microsecond overhead, 1 Mbyte/s bandwidth, 2 microsecond latency
A network with 2 microsecond overhead, 1 Mbyte/s bandwidth, 20 microsecond latency

Solution:
Needs solution.

Problem 5

True or false. A virtual memory system that uses paging is vulnerable to external fragmentation.
Solution:
False.

Problem 6

True/False Shortest positioning time first (SPTF), in which a disk scheduler selectes as the next block to fetch the block for which the (rotational time + seek time) is minimal is the optimal disk scheduling algorithm with respect to response time.

Solution:
False.