Class 16
CS 202
2 April 2015

On the board
------------

1. Last time
2. File systems: indexed files
3. File systems: directories

---------------------------------------------------------------------------

1. Last time

    --disk 

    --file systems intro: files, implementing files

    --today, continue from there

    Outline:

        A. [last time] intro
        B. [last time] files
        C. implementing files
            1. [last time] contiguous
            2. [last time] linked files
            3. [last time] FAT
            4. [today] indexed files
        D. [today] Directories
        E. [next time] FS performance: FFS case study

2. File systems : implementing files : indexed files

    Remember the goal:

			inode
		offset ------>  disk block address


    [DRAW CLASSIC INODE]

	--classic Unix file system 

	--inode contains:

	    permisssions
	    times for file access, file modification, and inode-change
	    link count (# directories containing file)
	    ptr 1  --> data block
	    ptr 2  --> data block
	    ptr 3  --> data block
	    .....
	    ptr 11  --> indirect block 
				  ptr --> 
				  ptr --> 
				  ptr --> 
				  ptr -->
				  ptr -->
	    ptr 12 --> indirect block
	    ptr 13 --> double indirect block
	    ptr 14 --> triple indirect block


    This is just a tree.

    Question: why is this tree intentionally imbalanced?

        (Answer: optimize for short files. each level of this tree
        requires a disk seek...)

    Pluses/minuses:

    +: Simple, easy to build, fast access to small files

    +: Maximum file length can be enormous, with
       multiple levels of indirection 

    -: worst case # of accesses pretty bad

    -: worst case overhead (such as 11 block file) pretty bad

    -: Because you allocate blocks by taking them off unordered
       freelist, metadata and data get strewn across disk


    Notes about inodes:

    --stored in a fixed-size array

    --Size of array fixed when disk is initialized; can't be changed

    --Multiple inodes in a disk block

    --Lives in known location, originally at one side of disk,
    now lives in pieces across disk (helps keep metadata close
    to data)

    --The index of an inode in the inode array is called an
    ***i-number***

    --Internally, the OS refers to files by i-number

    --When a file is opened, the inode brought in memory

    --Written back when modified and file closed or time elapses

3. File systems : directories

    --Problem: "Spend all day generating data, come back the next
    morning, want to use it."  F. Corbato, on why files/dirs
    invented.

    --Approach 0: Have users remember where on disk their files are

	--like remembering your social security or bank account #
  
	--yuck. (people want human-friendly names.)

    --So use directories to map names to file blocks, somehow
	
	--But what is in directory?

    --A short history of directories

	--Approach 1: Single directory for entire system

	    --Put directory at known location on disk

	    --Directory contains <name,inumber> pairs

	    --If one user uses a name, no one else can

	    --Many ancient personal computers work this way

	--Approach 2: Single directory for each user

	    --Still clumsy, and "ls" on 10,000 files is a real pain
	    --(But some oldtimers still work this way)

	--Approach 3: Hierarchical name spaces. 

	    --Allow directory to map names to files ***or other dirs***

	    --File system forms a tree (or graph, if links allowed)

	    --Large name spaces tend to be hierarchical

		--examples: IP addresses (will come up in networking
		unit), domain names, scoping in programming languages,
		etc.)

		--more generally, the concept of hierarchy is everywhere
		in computer systems

    --Hierarchial Unix

	--used since CTSS (1960s), and Unix picked it up and used it
	nicely

	--structure like:
		            "/"
	     bin  cdrom    dev       sbin           tmp
			        awk chmod ....

	--directories stored on disk just like regular files

	    --here's the data in a directory file; this data is in the
	    *data blocks* of the directory:

	      [<name, inode#>]
	       <bin, 1021>
	       <dev, 1001>
	       <sbin, 2011>
	       ....

	    --i-node for directory contains a special flag bit

		--only special users can write directory files

	--key point: i-number might reference another directory

	    --this neatly turns the FS into a hierarchical tree, with
	    almost no work

	--another nice thing about this: if you speed up file
	operations, you also speed up directory operations, because
	directories are just like files

	--bootstrapping: where do you start looking?

	    --root dir always inode #2 (0 and 1 reserved)

	    --and, voila, we have a namespace!


	--special names: "/", ".", ".."

	--given those names, we need only two operations to navigate the
	entire name space:

	    --"cd name": (change context to directory "name")
	    --"ls": (list all names in current directory)


	--example:

	    [DRAW PICTURE]


	--links:

	    --hard link: multiple dir entries point to same inode; inode
	    contains refcount

		"ln a b": creates a synonym ("b") for file ("a")

		--how do we avoid cycles in the graph? (answer: can't
		hard link to directories)

	    --soft link: synonym for a *name*

		"ln -s /d/a b": 

		--creates a new inode, not just a new directory entry

		--new inode has "sym link" bit set

		--contents of that new file:

		    "/d/a"