CS202 Spring 2015 Lab 7: Journaling and transactions

Released Friday, April 24, 2015
Due Wednesday, May 6, 2015, 9:00 PM

Lab 7: File system journaling and transactions

Introduction

In this lab, you will add journaling (logging) and crash recovery to a simple file system.

Some notes:

Getting Started

We recommend doing this lab inside the development platform that you set up in the previous labs (the virtual devbox or the CIMS machines).

From within this platform, change to your lab directory, use Git to commit changes you've made since handing in Lab 6 (if any), obtain the latest version of the course repository, and then create a local branch called lab7 based on our origin/lab7 (which is our lab7 branch). The commands are as follows, but you may not need all of them (for example, if git status indicates that there are no changes to commit):

$ cd ~/cs202
$ git commit -am 'my solution to lab6'
Created commit 254dac5: my solution to lab6
 3 files changed, 31 insertions(+), 6 deletions(-)
$ git pull
Already up-to-date.
$ git checkout -b lab7 origin/lab7
Branch lab7 set up to track remote branch refs/remotes/origin/lab7.
Switched to a new branch "lab7"
$ cd lab7

You are now in the lab7 directory. Your next step depends on whether you are using the CIMS machines or the devbox:

CIMS Users

You have to use one of the following machines:
crackle1.cims.nyu.edu 
crackle2.cims.nyu.edu 
crackle3.cims.nyu.edu 
crackle4.cims.nyu.edu 
crackle5.cims.nyu.edu 
Note that this requirement supersedes the usual list of machines.

Next, install required library files:

$ pwd     # make sure that you are in lab7 directory
/home/your-netid/cs202/lab7
$ ./install-lab7-for-cims.sh
You only need this on CIMS machines, press <ENTER> to continue, or <Ctrl>-C to cancel
<ENTER>
Copying fuse library....
Installation successful

Devbox Users

Install the necessary library and binary using apt: (default password for using sudo is labs)
$ sudo apt-get update # update the apt catalog
.....
$ sudo apt-get install fuse libfuse-dev
.....

The code provided this lab contains a filesystem that is a simplified version of the one that you completed in lab 6 (see below for the differences). Your job is to add journaling (logging) and crash recovery capabilities. Code is provided in the following files:

FUSE

The file system for this lab is implemented as a FUSE driver, similar to that of lab 6. A basic overview of FUSE is available in lab 6.

Our File System

The file system for this lab is similar to lab 6's file system, with a few small changes to make the system easier to work with:

Journaling

The file system presents an interface by which users can commit transactions. Users can modify the file system by creating, modifying, and deleting files, and then send a commit message to save their changes. Commit messages are sent via an ioctl (I/O control) system call: more information about the ioctl interface is presented in the next section.

From the file system user's perspective, creating and comitting two new files, file1 and file2, will look like this:

$ [send commit ioctl] # end of most recent transaction

...

$ touch mnt/file1
$ touch mnt/file2
$ [send commit ioctl]

# at this point, file1 and file2 are both committed in a single transaction

Our file system supports this transaction interface with journaling: all operations that modify the file system state are written to a log on the disk before the file system modifies on-disk inodes and blocks. When the file system receives a commit message, a commit entry is written to the log.

The file system makes a guarantee to the user that committed transactions will survive file system crashes, and uncommitted transactions will not. When the commit ioctl returns, the file system makes a guarantee to the user that the commit operation (and all other operations in the transaction) have been persisted to the disk log. (By persisted, we mean that the sector that contains the given entry is actually on disk.)

Our use of logging is simplified, compared to a production logging system (and also compared to the protocols presented in lecture). In particular, our file system will not include checkpoints: this means that the log will continue to grow as the file system is used. When the file system is restarted, the entire on-disk structure (inodes, bitmaps, blocks...) is erased and re-constructed from the log, by replaying committed transactions. You should be able to see from this picture why non-committed transactions do not appear after a crash: they are not replayed from the log.

Implementation

You will implement some components of the FUSE driver (and hence the file system): adding support for logging, crash recovery, and a transaction abstraction. Because a lot of the implementation is provided, it is important that you familiarize yourself with the provided code and the various file system interfaces.

If you haven't already, read OSTEP 42 to gain background on transactions and journaling.

Like in lab 6, the main file system code that we've provided for you resides in fsdriver.c. This file contains all the FUSE callbacks to handle I/O syscalls, as well as the main function. Once the path to the disk image and the path to the mount point (testfs.img and mnt respectively in the supplied example) have been read from the command line arguments, our FUSE driver maps the specified disk image into memory using the map_disk_image function (defined in disk_map.c), which itself initializes some file system metadata. Then, fsdriver.c calls fuse_main, which handles kernel dispatches to our registered callbacks. These callbacks are similar to the callbacks in lab 6: you will modify callbacks that write to the file system.

The FUSE driver code we provide maps the disk image into virtual memory using the mmap syscall: see lab 6's description of mmap for more information. You can think of this as caching the disk in memory. To write modified, cached contents back to the disk, one uses the command msync. Indeed, you will use msync at various points in this lab because you will need to ensure that particular disk sectors that are modified in memory (namely, those containing certain log entries) have actually "made it" to the disk. This is the underlying mechanism that gives the file system user the assurance that operations that claim to be committed are in fact fully logged.

As mentioned in the previous section, user-level processes send requests to commit to the file system using an ioctl system call. In general, ioctl is a mechanism for sending driver-defined request codes to a driver (the idea is that both the sender and the device know what a given ioctl request number means, but the kernel does not interpret them). The supplied program send_ioctl can be used to send ioctls to your driver:

$ ./send_ioctl mnt/.txn [request code]
Request codes are defined in crashmod.h. The following request codes are provided:

So, for example, the following command sends a commit message to your FUSE driver:

$ ./send_ioctl mnt/.txn 4000 # When this program returns, users
                             # of the file system have a guarantee
                             # that their changes have been committed.

It is only after the commit ioctl returns that the file system user has a guarantee that the transaction has taken effect. If there is a crash before the ioctl is called, the transaction has not taken effect. If there is a crash during the commit ioctl, then it depends on whether the commit point was reached. (What is the commit point?)

For this lab, we are assuming the file system has a single client: you do not need to worry about multiple transactions occurring at the same time. The potential source of non-atomic behavior is if the file system crashes.

Creating a Simple Transaction Interface

Simple log data structures and functions to manipulate them reside in log.h and log.c: your job is to integrate them with the file system.

Before starting any work on the driver, run ./chmod-walk in the Lab 7 directory. This will set up permissions in the directories leading up to your lab directory correctly so that you will be able to run the driver successfully. (If you do not run this script, FUSE will be unable to mount or unmount file systems in your lab directory.)

The file system log in this lab is an array of log entries. Each log entry has an opcode. These opcodes are defined in log.h, e.g. commit entries have an opcode of 19. Almost every user update to the file system will generate a new log entry. Each log entry stores all the necessary information to reapply the effects of the corresponding user update to the file system.

Now, take a look at the union type log_args_t defined in log.h. (A union, in C, is a C datatype that can represent multiple types of structs.) This union stores all of the information associated with a file system update (write, truncate, etc.). User accesses will generate struct log_entrys; each log_entry comprises a log_args_t plus some log-related metadata.

Exercise 1. In log.c, implement log_tx_add and log_tx_done.

log_tx_add takes an opcode, a pointer to a log_args_t and a timestamp; this function should append a new entry to the log. The struct log structure is basically an array of log entries plus some bookkeeping data fields. Then nentries field of struct log denotes the number of entries in the current log; the txn_id field stores the current transaction id. Note that you can use nentries to find the latest inserted entry: it is log->entries[log->nentries-1]. Therefore, you should insert the new entry at nentries and advance nentries by 1 when you are done. New log entries should also store information about their current transaction, specifically the transaction number txn_id, which can be found at log->txn_id.

log_tx_done is called when a user process sends an "ioctl commit" command to the file system. Note that log_tx_done should not return until the transaction is durable in the log: that way, once "ioctl commit" returns, the user has a guarantee that the transaction has "taken," in particular that it will be seen from this point forward, regardless of whether there are crashes and restarts. (See the description of IOCTL_COMMIT earlier.)

But log_tx_done must arrange for something else as well. To provide crash atomicity, log_tx_done needs to bring the disk from one valid state to another; no intermediate state should be visible to observers, even if there is a crash. You will arrange for this by doing the following in log_tx_done. First, add a COMMIT log entry at the end of the log. Then make a call to msync to ensure that the COMMIT log entry, as well as all other log entries in the same transaction, are actually on the disk. (Use man msync to learn its usage; you want to use the MS_SYNC flag.) Note that at this point, this transaction is still not committed; why not? Finally, increment nentries and txn_id, and persist the incremented variables on disk by calling msync again (and using the MS_SYNC flag again). We can persist the two variables at the same time because they will be placed on the same disk sector, and recall from your reading and lecture that writes of single disk sectors happen atomically, and that the disk hardware enforces this. Once this second call to msync returns, the transaction is committed.

Make sure you understand what the commit point is here, why that is the commit point, and why it's the case that the system reaches the commit point atomically. Solutions that do not provide crash atomicity will not be considered correct. (In class, the commit point was the writing of the COMMIT log entry; we could have done the same thing here, but it would have required a different way of finding the end of the log during replay, versus the approach that you will take in Exercise 2.)

It's important to make sure that your log works correctly before proceeding to the next exercises.

During the course of testing your FUSE driver, various combinations of operations may cause your FUSE driver to stop functioning or enter a non-clean state. It may be helpful to search Piazza for information about specific error messages.

More generally, to get your system to a clean starting state, you can do:
$ fusermount -u mnt      # unmount the driver
$ test/makeimage.bash    # make a clean testfs.img
$ rm -rf mnt             # remove the mounting directory and its residents
$ mkdir mnt              # recreate the mounting directory
or you can invoke the following script, which is equivalent to the lines above:
$ ./startclean.bash

Exercise 2. In log.c, implement log_replay. This function is called when the driver is (re)started: it should replay all committed transactions -- and only committed transactions -- from the beginning of the log to the end of the log (log->nentries-1).

There are several ways to do this. One way is is to use two passes. In the first pass, record the transaction numbers of transactions that have commit entries. In the second pass, call log_entry_install on all transaction entries that are part of a committed transaction. log_entry_install takes a single log entry and modifies the file system: you will fill in this function in a later exercise. You can also implement log_replay in a single pass. To avoid undue code complexity, it may be helpful to model the replay logic as a FSM (finite state machine).

Use make grade to test your work so far. You should be able to pass the "replay" test.

Logging File System Operations

We have provided a full implementation of file system functions in fsdriver.c. In order for the file system to support transactions, all file system operations that modify the disk must have corresponding entries in the log.

(The following commands modify the disk: write, link, unlink, rename, mknod, and truncate.)

Exercise 3. Change fs_mknod in fsdriver.c to use transactions: currently, it writes changes directly to the disk using fs_mknod_install. First, add a log entry for the operation using log_tx_add: pass in the opcode OP_MKNOD and a pointer to a populated log_args_t. Then, call fs_mknod_install with the appropriate arguments. In this exercise and the next, you will find some or all of the following library functions useful: strcpy, memcpy, and time (as usual, use the man pages to learn the usage of these library calls; for example, man 2 time).

Also implement the OP_MKNOD case branch in log_entry_install (defined in log.c) by calling fs_mknod_install with the appropriate arguments.

At this point, your driver should pass the test test_touch_crash.bash; equivalently, you should pass the "touch" test when doing make grade. The test checks the following:

Exercise 4.

Using the same pattern as in Exercise 3, modify the following functions to use transactions: fs_write, fs_link, fs_unlink, fs_rename, and fs_truncate. See the prior exercise for standard C library calls that may be useful.

It will be easiest to do this by changing one function at a time. You can use the scripts provided in test/ and make grade to test your solution.
  1. After modifying fs_write, you should be able to pass the "echo", "time", "big_txn" and "many_txn" tests.
  2. After modifying fs_truncate as well as fs_write, you should be able to pass the "truncate" test.
  3. After modifying fs_rename, you should be able to pass the "mv" test.
  4. After modifying fs_link, you should be able to pass the "link" test.
  5. After modifying fs_link and fs_unlink, you should be able to pass the "unlink" test.

Honors supplement. Honors students (section 001) should complete the following exercise; students in section 002 are free (even encouraged) to do the exercise, but there will be no credit given.

Exercise 5 (Honors). The logging protocol saves both committed and uncommitted transactions to the log: partially committed transactions that can never commit waste space. Modify log_replay so it cleans up uncommitted transactions and compresses the log before returning. On returning, the log should only contain contigous, committed transactions. If the driver crashes during the middle of this clean-up, it shouldn't cause log corruption. You are free to add necessary fields to log and/or log_entry.

Questions

Answer the following questions in answers.txt.

  1. How long approximately did it take you to do this lab?
  2. Do you feel like you gained an understanding of transactions and logging in the lab? Please suggest improvements.

Handing in the lab

If you are working in a pair, only one of the team members should submit.

If you have not done so yet, commit your code.

$ git status  # see what files you changed
$ git diff    # examine your changes
$ git commit -am "My solutions to lab7"
Then, follow these steps:

After a successful submission, you should receive email confirmation (at your NYU email address).

If you submit multiple times, we will take the latest submission and count slack hours accordingly.

This completes the lab.


Last updated: Wed May 06 17:51:31 -0400 2015 [validate xhtml]