First NMADS Meeting
Abstracts

Home

Participating Institutions

Committees

Call for
Participation


NMADS Past Events

Related
Events in the NYC Metro
Area

Send mail to NMADS

 

Invited Talk
Puppeteer: Component-based Adaptation for Mobile Computing
Willy Zwaenepoel
Department of Computer Science, Rice University

Puppeteer is a system for adapting component-based applications in mobile environments. Puppeteer takes advantage of the exported interfaces of these applications to perform adaptation without modifying the applications. The system is structured in a modular fashion, allowing easy addition of new applications and adaptation policies.

Our initial prototype focuses on adaptation to limited bandwidth. It runs on Windows NT, and includes support for a variety of adaptation policies for Microsoft PowerPoint and Internet Explorer 5. We demonstrate that Puppeteer can support complex policies without any modification to the application and with little overhead. To the best of our knowledge, previous implementations of adaptations of this nature have relied on modifying the application.

This talk reports on joint work with Eyal de Lara and Dan Wallach.

Short Talks

CANS: Composable, Adaptive Network Services
Xiaodong Fu, Weisong Shi, Anatoly Akkerman, and Vijay Karamcheti
New York University

The growth of the internet has been fueled by an increasing number of sophisticated network-accessible services. Unfortunately, the high bandwidth and processing requirements of such services is at odds with current trends towards increased variation in network characteristics and a large diversity in end devices. Ubiquitous access to such services requires the injection of additional functionality into the network to handle protocol conversion, data transcoding, and in general bridge disparate portions of the physical network.

CANS is an application-level infrastructure for injecting application-specific components into the network that focuses on three challenges: (a) efficient and dynamic composition of individual components; (b) dynamic and distributed adaptation of injected components in response to system conditions; and (c) support for legacy applications and services. The network view supported by CANS consists of applications, stateful services, and data paths between them. Both services and data paths can be dynamically created and reconfigured: a planning and event propagation model assists in distributed adaptation, and a run-time type-based composition model dictates composition. It supports legacy applications and services using interception and delegation.

This talk will describe the CANS architecture and implementation, and a case study involving a shrink-wrapped client application in a dynamically changing network environment where CANS was used to improve overall user experience.

Océano: Infrastructure for a Computing Utility
Michael Kalantar, Karen Appleby, Sameh Fakhouri, Liana Fong, Germán Goldszmidt, Srirama Krishnakumar, Donald Pazel, John Pershing, and Benny Rochwerger
IBM T. J. Watson Research Center

Océano is a prototype of a highly available, scaleable, and manageable infrastructure for an e-business computing utility. It enables multi-enterprise hosting on a collection of shared resources (eg., servers). However, at any point of time, each resource is assigned for use by only a single customer. That is, the hosting environment is divided into smaller, secure domains, each supporting one customer. These domains are dynamic: the resources assigned to them may be augmented when load increases and reduced when load dips. This dynamic resource allocation enables flexible Service Level Agreements (SLAs) with customers in an environment where peak loads are an order of magnitude greater than the normal steady state.

Effect of Interrupts in Multiprocessor Servers
Aniruddha Bohra and Liviu Iftode
Rutgers University

Problem and Motivation:
Interrupts are the major performance bottlenecks in computer systems with very high Input/Output(I/O) activity. With the advent of new applications like webservers, which are not expected to carry out any computationally intensive tasks, the cost of interrupts has become a more pronounced bottleneck. The highly optimized architectures which rely heavily on pipelines, make the interrupt cost higher than it was before. The problem is compounded in case of multiprocessor servers where there is a considerable overhead, of cache and Translation Lookaside Buffer(TLB)pollution,saving of the system state and synchronization, to service interrupts at a very fast pace. Denial of Service attacks are proof of the fact that by a very simple mechanism of bombarding a system with a very large number of packets at a very fast rate, the applications fail to progress (livelock). This makes us think whether the current interrupt driven mechanism for handling of network events is the best, or even correct! 

The present study wishes to address the questions regarding the interrupt structure on multiprocessors for highly I/O bound processes like the webserver. We wish to investigate the effect of interrupts on the performance of a webserver running on a multiprocessor system under heavy I/O workload. We also aim to evaluate various alternative mechanisms that have been suggested for asynchronous event processing and try to suggest a  mechanism which would help reduce the impact of interrupts on the performance of the system, and eliminate livelock in such systems.

Background and Related Work:
Several studies about webservers and their interaction with the underlying operating system establish interrupt processing as the performance bottleneck. Hu et al. point out that the Apache webserver spends around 25%-40% of its execution time handling interrupts. Synchronization and signal handling are identified as severe performance penalties in the same study. Also, they predict the effect would be more pronounced in case of a multiprocessor.

Many researchers have looked at the problem of processing incoming messages synchronously. In particular the problem of polling v/s interrupts with active messages has received much attention. Several systems use a mixture of polling and interrupts. Some systems use interrupts for just the operating system or protocol specific messages and poll otherwise. Langendoen et al. use polling whenever the processor is idle and use interrupts otherwise, Soft Timers extend that idea and artificially interrupt the execution at a certain interval to prevent starvation of the polling thread for the network interface. Smith and Traw initiate the polling using clocked interrupts, that is polling of the device is initiated by a timer interrupt. Another very popular idea is to interrupt only when we cannot poll fast enough. Several systems implement this in hardware, in Polling Watchdog and in software by Mogul et al where the interrupt is used just to wake up the polling thread which then services all outstanding requests and then reactivates the interrupts.

However, none of the above studies investigate the impact of interrupts or their own event handling mechanisms on multiprocessor systems. Also, the fact that computation is not the main aim of the system in a very important class of applications has been ignored. Since the advent of the World Wide Web and the gain of importance of the class of applications that are not CPU bound, the effect of interrupt processing needs to be re-evaluated and the above mechanisms revisited in the new light. 

Approach and Uniqueness:
Traditional approaches to eliminating livelock typically suggest the use of polling and interrupts in conjunction. With interrupts acting as triggers to polling. The rationale behind it is, that for polling to be as effective as the interrupts, we need to poll tens of thousands of times per second, which even with the modern processor speed would leave few cycles for other processing.Our approach is a logical extension of the ideas presented in above studies, in the realm of multiprocessor environments.

Polling of the event source is a way to eliminate livelock. To have a good performance, we need to poll the source as fast as possible. If we have more than one processor, we can dedicate one processor to poll the devices and let the other processors carry on with the usual processing work. Here, we ensure that we poll as many times as the processor allows us to. Other benefits that we expect by letting the processor always poll the event source is eliminating the cost of invoking the interrupt handler with a context switch and removing the overhead of crossing the kernel boundary repeatedly. The above approach ensures that the performance of the system is not adversely affected while eliminating the problem of livelock in the system. 

Although by dedicating a processor to polling the devices, a potential processing element is being wasted, we expect, based on observations of several studies, that modern workloads are highly I/O bound and the current processors are fast enough that there is no loss in processing ability of the system as a whole. 

Results and Contributions:
We have implemented the mechanisms to handle network events using some of the above approaches We are currently working to provide a comprehensive comparison of all approaches for interrupt handling over multiprocessor systems. We have carried out some performance measurements over the Apache webserver using the above approaches as the underlying event handling mechanisms.

The preliminary results are promising and we expect to further strengthen our claim through them. We also wish to investigate the impact of a dedicated I/O processor and its possible use for certain performance enhancing tasks like prefetching. To study the effect of having multiple I/O processors in case of a large multiprocessor system is also one of the avenues that we would explore in future. 

Efficiency vs. Portability in Cluster-Based Network Servers
Enrique V. Carrera and Ricardo Bianchini
Rutgers University

Efficiency and portability are usually conflicting objectives for cluster-based network servers that distribute the clients' requests across the cluster based on the actual content requested. Our work is based on the observation that this efficiency vs. portability tradeoff has not been discussed before in the literature. To fill this gap, in this paper we study this tradeoff in the context of an interesting class of content-based network servers, the locality-conscious servers, using modeling and experimentation. Our analytical model gauges the potential performance benefits of portable and non-portable locality-conscious request distribution with respect to a traditional, locality-oblivious server, as a function of multiple parameters. Based on our experience with the model, we design and evaluate a portable, locality-conscious server. Experiments with our server, a non-portable server, and a traditional server validate and confirm our modeling results under several real workloads. Based on our modeling and experimental results, our main conclusion is that portability should be promoted in cluster-based network servers with low processor overhead communication, given its relatively low cost (<= 15%) in terms of efficiency. For clusters with high processor overhead communication, efficiency should be the overriding concern, as the cost of portability can be very high (as high as 106% on 32 nodes). We also conclude that user-level communication can be useful even for non-scientific applications such as network servers.

Low-Power Fault-Tolerant Real-Time Network-Attached Storage Device
Tzi-cker Chiueh
State University of New York at Stony Brook

Phoenix is a real-time network-attached storage device (NASD) that guarantees real-time data delivery to network clients even across single disk failure. The service interfaces that Phoenix provides are best-effort/real-time reads/writes based on unique object identifiers and block offsets. Data retrieval from Phoenix can be serviced in server push or client pull modes. Phoenix's real-time disk subsystem performance results from a standard cycle-based scan-order disk scheduling mechanism. However, the disk I/O cycle of Phoenix is either completely active or completely idle. This on-off disk scheduling model effectively reduces the power consumption of the disk subsystem, without increasing the buffer size requirement.  Phoenix also exploits unused disk storage space and maintains additional redundancy beyond the generic RAID5-style parity.  This extra redundancy, typically in the form of block replication, reduces the time to reconstruct the data on the failed disk. This talk presents the design and implementation of Phoenix, one of the first, if not the first, NASDs that support fault-tolerant, real-time, and low-power network storage service, and a detailed performance evaluation of the Phoenix prototype based on Linux. I will also talk briefly about the evolution of the Phoenix project, and the on-going research on large-scale virtual disk architecture. 

Trading Capacity for Performance in a Disk Array
Xiang Yu and Randy Wang
Princeton University

A variety of performance-enhancing techniques, such as striping, mirroring, and rotational data replication, exist in the disk array literature. Given a fixed budget of disks, one must intelligently choose what combination of these techniques to employ. In this work, we present a way of designing disk arrays that can flexibly and systematically reduce seek and rotational delay in a balanced manner. We give analytical models that can guide an array designer towards optimal configurations by considering both disk and workload characteristics. We have implemented a prototype disk array that incorporates the configuration models. In the process, we have also developed a robust disk head position prediction mechanism without any hardware support. The resulting prototype demonstrates the effectiveness of the configuration models.

The Implementation and Evaluation of a Rotation-Latency-Sensitive Disk Scheduler 
Lan Huang and Tzi-cker Chiueh
State University of New York at Stony Brook

Disk rotation latency is becoming an increasingly significant component of disk service time, and yet has rarely been exposed to system software for scheduling purpose. In this paper, we presented the design and implementation of a disk scheduler that can effectively incorporate rotation latency into its scheduling decision. The two enabling mechanisms responsible for the successful implementation of this disk scheduler are an accurate run-time disk head position prediction scheme, and a precise disk service time model. We show that the accuracy of disk head position prediction and disk service time prediction can approach close to 100% and more than 95%, respectively. We used both real and synthetic traces to evaluate the performance of this rotation latency-sensitive disk scheduler (RSDS) and its variants, and found that the performance improvement of RSDS over conventional disk schedulers that consider only seek delay is up to 48%.

GulfStream - Dynamic Topology Management in Multi-domain Server Farms
Sameh A. Fakhouri
IBM T.J. Watson Research Center

A multi-domain server farm is a collection of servers divided into a number of distinct domains. While the resources of the server farm are shared, each domain is isolated from the others. Server farm administrative software allocates resources, and enforces isolation policies. This paper describes GulfStream, a distributed software system that addresses the problem of managing the network topology of such a server farm. In particular, it addresses the following core problems: topology discovery and verification, and failure detection.

Unlike most topology discovery and failure detection systems which focus on the nodes in a cluster, GulfStream logically organizes all of the network adapters of the server farm into groups. Each group contains those adapters that can directly exchange layer 2 messages. GulfStream dynamically establishes a hierarchy for reporting network topology and availability of network adapters.

We describe a prototype implementation of GulfStream on a 55 node heterogeneous server farm interconnected using switched fast ethernet. This was done in the context of the Oceano project which provides for dynamic resource allocation in response to workload variations. We discuss scaling GulfStream to larger environments; we envision its use in server farms containing thousands of nodes.

Quantifying the Impact of Architectural Scaling on Communication
Taliver Heath, Samian Kaur, Richard P. Martin, and Thu D. Nguyen
Rutgers University

This work quantifies how persistent increases in processor speed compared to I/O speed reduce the performance gap between specialized, high performance messaging layers and general purpose protocols such as TCP/IP and UDP/IP. The comparison is important because specialized layers sacrifice considerable system connectivity and robustness to obtain increased performance. We first quantify the scaling effects on small messages by measuring the LogP performance of two Active Message II layers, one running over a specialized VIA layer and the other over stock UDP as we scale the CPU and I/O components. We then predict future LogP performance by mapping the LogP model's network parameters, particularly overhead, into architectural components. Our projections show that the performance benefit afforded by specialized messaging for small messages will erode to a factor of 2 in the next 5 years. Our models further show that the performance differential between the two approaches will continue to erode without a radical restructuring of the I/O system. For long messages, we quantify the variable per-page instruction budget that a zero-copy messaging approach has for page table manipulations if it is to outperform a single-copy approach. Finally, we conclude with an examination of future I/O advances that would result in substantial improvements to messaging performance.

Operating System Support for Safely and Efficiently Programmable Routers
Prashant Pradhan and Tzi-cker Chiueh
State University of New York at Stony Brook

Placing computation inside the network can yield significant performance benefits to network-based applications. These benefits accrue from topologically strategic placement of computation and from the ability of such computation to exploit global network context. For example, congestion control state can be shared between flows passing through an intranet's portal router, by placing a function in the router that aggregates congestion control state on a path-by-path basis [1].

To support placement of computation in the network, a router operating system should support appropriate abstractions for composing computation on network flows, and provide efficient implementations of these abstractions. More importantly, the core router's integrity and performance should remain unaffected in the presence of such a composable computation framework. Such isolation requires memory protection and performance protection of the router kernel from dynamically added functions. With these goals in mind, we have developed a router operating system that allows safe and efficient composition of computation on network flows. We present the essential features of this OS and describe an example application, Aggregate TCP (ATCP) [1], that can be implemented using these features.

Computation is composed in terms of the following entities :

- Extension functions: These are preemptible functions that can carry state across invocations. Every extension function invocation is made in some execution context or flow. An extension function may have multiple pending invocations, issued in different flow contexts.

- Flows: Flows are abstract execution contexts. A flow is a unit of scheduling and resource allocation. Flows may allow other flows to share their resources through a simple access control mechanism.

Given the above entities, there are two key mechanisms to compose computation and to determine control flow :

- Asynchronous Invocation: A function, invoked in a given flow context, may pass control to another function by posting an invocation to it. The invocation is asynchronous, and the CPU scheduler determines when to make invocations pending in various flow contexts.

- Static Binding: Flows may statically bind themselves to a stream of packets by specifying a packet filtering rule, and an extension function that should get control when a packet matches the rule.

- Dynamic Binding: To pass control to a target function, a given function may reference the target function by names that are strings with semantic connotations. The router OS provides a mechanism to register and query these names.

To implement this framework with good invocation performance, extension functions may be co-located with the router kernel to avoid expensive context-switching and TLB flushing overheads. However, the router kernel's safety is not compromised, owing to the use of intra-address space protection [2]. The extension functions are placed in a lesser privileged subset of the kernel address space, which provides memory protection to the kernel, but only incurs the overhead of a protected function call while making extension function invocation. Performance protection is ensured through a preemptive CPU scheduler.

Aggregate TCP congestion control (ATCP) [1] is an ideal function for placement inside the network, since it can exploit global information about congestion status on various network paths and allow TCP flows to avoid their cold-start phase in congestion estimation. In ATCP, a router placed at the edge of the network, maintains congestion control related state for flows passing through it, grouped by the destination subnet of these flows. An ATCP router, upon receiving a TCP connection request, splits it into a local subconnection (L) and a remote subconnection (R). R starts from a congestion window equal to the warm estimate. On L, an available credit is maintained, depending upon the congestion window of R and its growth mode (linear/exponential). Since the RTT on L is much smaller than that on the whole L-R path, the congestion window for L can grow to the warm estimate much faster.

ATCP doesn't require any changes to the end-system TCP implementations and its evaluation using a real web server trace shows a potential improvement of upto a factor of 2 in normalized HTTP transaction latency. ATCP can be naturally and efficiently implemented using the API exposed by the proposed operating system.

References

[1] P. Pradhan; T. Chiueh; A. Neogi, "Aggregate TCP Congestion Control Using Multiple Network Probing", Proc. ICDCS-2000.

[2] T. Chiueh; G. Venkitachalam; P. Pradhan, "Integrating Segmentation and Paging Protection for Safe, Efficient and Transparent Software Extensions" , Proc. ACM SOSP-99.

Transport Layer Support for Highly-Available Network Services
Kiran Srinivasan, Florin Sultan, and Liviu Iftode
Rutgers University

Problem and Motivation:
There has been a growing trend in the Internet to view resources as services rather than as servers. The coupling between a resource and its location (IP address) is progressively losing importance. The recipients of the services (clients) are only interested in obtaining good quality service no matter from where (geographical location) the service is being provided. This requires the service providers to provide highly-available and high-throughput services. This problem is being approached in two different ways. One method is of building fault-tolerant, high-throughput servers from a cluster of computers connected by a SAN. The other approach is of distributing the servers of the same service onto geographically different areas. The current transport layer framework (TCP) does not facilitate either of these mechanisms as it does not support easy connection hand-off from one server to another. The primary emphasis of this project is to explore and design a transport layer mechanism that enables building services based on the above architectures.

Background and Related Work:
Related work includes TCP/IP connection hand-off protocols used either for mobility extensions to TCP/IP [Bakre95, Balakrishnan95, Snoeren00] or for request distribution in clusters [Aron99]. Indirect TCP [Bakre95] provides hand-off support for physical mobility of a connection endpoint over a wireless link, by splitting and maintaining hard state for it at a base station, and by transferring that state to the next base station during hand-off. [Balakrishnan95] provides support for mobility by maintaining soft state in the base stations, at the expense of flooding nearby base stations using multicast to build necessary state prior to a handoff. [Snoeren00] changes the protocol stack at the endpoints to provide end-to-end mobility when the network attachment point (IP address) changes.

All these approaches do not consider the task of migrating the connection endpoints between physically distinct machines. They either rely on directly using the full connection state maintained at the physical endpoints [Bakre95, Balakrishnan95], or on restarting a previously established connection after a IP address change by using an authentication mechanism to reuse the state of the old connection [Snoeren00].

[Aron99] proposes TCP connection handoff in clustered servers for distributing incoming connection requests from a front-end machine to the server back-end nodes. The approach has limitations due to its specialized nature (cluster-based web server): in their single handoff scheme a connection endpoint can migrate only during the connection setup phase. Multiple handoff of persistent HTTP/1.1 connections is only mentioned as an alternative, but no design or implementation is described. Even in the multiple handoff scheme, the granularity of migration of live connections is application-dependent: a connection can only migrate after fully servicing a HTTP request.

Our mechanism for connection handoff targets migration of the endpoint of an active (live) connection between physically distinct hosts, transparent to the fixed endpoint, at any moment during the connection lifetime.

Approach and Uniqueness:
The model of client-server interaction that we envision in our design assumes that there are a number of hosts (the servers), either clustered or distributed across the Internet, that provide the same service (i.e., run the same application). A client contacts a preferred server using a TCP connection, at the beginning of a service session.

During the lifetime of a session, under the control of our scheme, the remote endpoint of the connection may transparently (to the client application) migrate between the servers, for example in response to events like failure of the current server or a loss in the quality of service received by the client. In response, the transport layer at the destination server reincarnates the connection endpoint existing at the previous server. In case of a failure to do so, this event is treated just as a failure to provide the service and the session/connection is aborted.

Connection migration involves only one endpoint of the connection (the server side), while the other endpoint (client side) is fixed. Migration may occur multiple times and at any moment throughout the lifetime of an active connection. 

Because migration may occur at any time, the application level state associated with the ongoing data transfer may have to be restored at the destination host to ensure correct continuation of the data transfer on the migrated connection. We provide a minimal API for exporting/importing the server-specific state that completely describes the ongoing data transfer at the application level. The writer of a server application uses our API in order to take advantage of dynamic server-side migration of live TCP connections.

Our migration mechanism can be uniformly used both in cluster-based servers or in groups of servers distributed over wide area. Inside a cluster, we can take advantage by a backend network of high bandwidth and low latency (SAN) to proactively replicate connection state across the cluster. This provides hot-swappable endpoints for a connection: in case of a failure of the server node, the client can transparently continue communication with a backup node. Over wide area, it can be used to recover from events like network congestion and DoS attacks, that degrade the quality of the service received by the client.

In our system, the client applications (e.g., web browser) are unchanged. We change the client and server TCP/IP stack to accommodate a new type of connection. Although we require changes to server applications, we believe the programming effort involved should be fairly low.

Results and Contributions:
We are currently designing and implementing our migration scheme by extending the TCP specification.

The connection migration mechanism we propose can be used (i) on the client side, to dynamically shift to another server when the current server does not provide a satisfactory level of service, or (ii) on the server side, to recover connections from hard node failures in cluster-based servers, or to implement a load balancing scheme by shedding load from existing connections to other less loaded servers.

In addition, we believe that our scheme can be used to alleviate the impact on existing connections of certain types of DoS attacks (SYN flooding, the process table attack) by shifting connections from the host under attack to alternate servers. While the migration mechanism may need to use resources on the attacked machine (which, depending on the severity of the attack, may or may not be available), connection migration provides in any case a better alternative than the loss of existing connections.

References:
[Aron99] M. Aron, P. Druschel, W. Zwaenepoel. Efficient Support for P-HTTP in Cluster-Based Web Servers. USENIX '99.

[Bakre95] Ajay Bakre, B. R. Badrinath. Handoff and system support for Indirect TCP/IP. Second Usenix Symposium on Mobile and Location-dependent Computing, April 1995. 

[Balakrishnan95] H. Balakrishnan, S. Seshan, E. Amir, R. H. Katz. Improving TCP/IP Performance over Wireless Networks. 1st ACM Conf. on Mobile Computing and Networking, November 1995.

[Snoeren00] A. C. Snoeren, H. Balakrishnan. An End-to-End Approach to Host Mobility. 6th ACM MOBICOM, August 2000.

An Architecture to Define, Monitor and Bill SLAs in a Server Farm
Juliana Cunha and Karen Appleby
T. J. Watson Research Center

We present an architecture to define and maintain Service Level Agreements, which was developed as part of a larger project, Oceano. An XML Contract is used to establishes an SLA between a customer and a service provider. This contract includes report definition, violation policy descriptions, penalties for disruption of service and pricing. The information provided in the contract must be validated for integrity, to avoid conflicts and errors. In addition, we need to guarantee that the provider has a sufficient number of resources to support the defined service level. Oceano will monitor the enforcement of the contract and triggers the policy engine whenever a violation occurs. Contract Violations are expressed as policies, which include a violation scenario, start and stop time, the violator agent and a related action. The possible actions are procedures to correct the problem, and/or apply monetary value to be charged from the service provider violator. A pricing engine (that combines the aggregate cost values of each resource involved in the violation scenario) is responsible for the billing calculations. Reports are periodically generated to inform the participants of the system behavior. We address the problem of SLA definition by using customer feedback and providing a flexible way to define and monitor the quality of service. Another important issue is the integration between the SLA monitoring and pricing model. We will present recent results, and ongoing work.

Posters

Cooperative File System (CFS) for PC clusters
Suresh Gopalakrishnan and Liviu Iftode
Rutgers University

Problem and Motivation: 
CFS aggregates local storage available on each node in a PC cluster into a single global file system that supports not just location transparency but also location independence. The main goal is to provide scalable I/O throughput and support variability in load without compromising manageability and flexibility. In recent times there is a significant trend in the usage of file systems towards server-based applications rather than the traditional interactive applications. These changes in usage and traffic patterns along with the cluster based server models motivate the idea of building a global cluster file system. Location independence allows CFS to manage within itself, transparently and dynamically, the tasks of file placement and file migration for load balancing and file replication for fault tolerance. The possibility of using global memory (cooperative caching) and the availability of fast interconnects with user level communication obviate concerns about performance.

Background and Related Work: 
CFS is different from other DFS projects (Andrew, Sprite, xFS, NFS, Frangipani, Archipelago etc.) mainly in that

1. it runs in the user level 
2. on top of the existing file systems 
3. using volatile metadata that can be rebuilt at run time
4. providing location independence

Projects like Frangipani and xFS were built from scratch and included storage management (Petal) issues also in the design, which complicates the whole system. CFS runs on top of any commodity FS and chooses to leverage on existing systems rather than attempting to redesign a file system itself.

Approach and Uniqueness: 
The key to providing location independence is maintaining virtual directories on each node that contain global information. There is a single global (virtual) file system that is the conglomeration of the local file systems on participating nodes, with a single root. Directories are allowed to be replicated, so directories with the same name in the local filesystems (eg: /bin) are considered to be replications of the single global directory with the same name. The virtual directories are maintained in memory and are volatile - they are built up at runtime by a procedure called 'directory merging' or 'dirmerge'.

The CFS is implemented at the user level as a library that provides all the file system interface needed by the applications. The design of CFS has two logical layers - one is the virtual directories that contain the metadata required to provide location independence and the second is a cooperative caching layer with global replacement that caches the file data at block level. CFS exploits the high speed interconnects and user level communication with features like memory mapped communication provided by VIA to achieve better performance as well as to allow relaxed consistency protocols.

Results and Contributions: 
The basic CFS implementation consisting of the cooperative caching and virtual directories has been implemented. Using CFS, we have also built a user level distributed NFS server that we are currently benchmarking using SPEC NFS benchmark and traces from the file servers in the department. The testbed consists of a cluster of 8 PCs (300 MHz) connected by a VIA-based interconnect (Giganet). The next step is to design and evaluate file replication and migration policies and mechanisms. CFS will also be used as a platform to study other (distributed systems) issues like impact of communication on performance, local v/s global cache replacement policies, meta-data consistency models etc.

Encryption Servers: A Cost-Effective Versatile Solution for Internet Security
Vivek Pathak and Liviu Iftode
Rutgers University

The explosive growth of the Internet has given rise to a number of security concerns. These concerns have become important due to the increasing use of electronic commerce. As the Internet and its user community are distributed in nature, any large scale Internet security solution has to tackle a number of issues. These include scalability, lack of single points of failure and economic viability. This motivates the creation of Encryption server, a scalable and cost effective medium for encryption and authentication of Internet traffic.

Encryption server provides IP level security. It implements encryption and authentication of forwarded IP packets. The Encryption servers function as secure gateways to the Internet and support a destination based security policy. Variable key lifetimes are used to provide a constant level of security and performance for a wide range of operating loads. The authentication of fresh public keys is done by a vote among the peers and avoids the use of a single trusted third party. An automatic key exchange protocol for authenticated propagation of public keys and secure destinations is developed. It uses lazy key update and optimistic trust to eliminate overhead in the common operating case. A partial implementation on the Linux operating system has been done and preliminary experiments have shown promising results.

It is intended to explore the performance tradeoffs in the variable key lifetime approach. Issues like caching of cryptographic processing and interrupt free forwarding of IP packets have to be studied. The performance tradeoffs of various encryption methods and key exchange mechanisms have to be experimentally validated. The scalability issues present due to hardware limitations have to be addressed by creation of encryption clusters.

Cooperative Caching Middleware for Clusters
Matias Cuenca and Thu D. Nguyen
Rutgers University

In order to seize the power of commodity clusters, Internet servers sometimes take advantage of the aggregate amount of memory in the cluster nodes by using cooperative caching (CC) mechanisms. Such mechanisms can reduce the number of accesses to disk, and as a result reduce the service latency and improve overall server throughput. Most of the past work on CC has been done at the application level, i.e., caching was tailored to a specific server's needs. For example, some WWW servers have assumed Zipf request distributions, have used (variable-sized) files as the caching unit, and have applied local rather than global replacement policies. The problem with this approach is that the caching and coherence infrastructure has to be rebuilt for each new server. In contrast, we believe that cooperative caching should be implemented as a middleware layer that can be reused by a variety of servers. To verify this claim, we are developing a simulator that can test multiple CC policies and mechanisms. At first, we will focus on a distributed file system and will compare the performance of hand-optimized CC against that of our general CC infrastructure. With our simulator and file system, we plan to study the tradeoff between the performance and the generality of the middleware we propose.

A Window Into Your Computing Environment
Christopher Peery and Thu D. Nguyen
Rutgers University

With the advent of Virtual Computing it has become possible for users to move the execution of applications off of their local machines and on to a better computing base that is usually located at a remote location. To the users of this form of Virtual Computing the application should appear as though it were still running on the local machine. This application could be a anything, from a simple program, to a graphic environment, or even to an entire operating system.

Currently this model has been implemented for simple applications. For example, AT&T has developed an application known as Virtual Network Computing (VNC). Using this application, users are able to access a simplified X-server that is running on a remote machine. Since the X-server is running remotely, all the state of the server is kept remotely as well.  Thus, users are able to utilize a graphical interface that can potentially (depending on the reliability of the remote machine) survive a crash of the local host. In addition, a given user can access the same remotely running X-server from any number of different machines and is therefore not bound to working at a given computer.

In addition to virtual computing, the computing world has seen a huge growth in the development of PDAs, such as the Compaq IPAQ. The goal of this project is to combine the versatility and portability of hand-held devices with the reliability and availability of virtual computing to allow individuals to access an already existing computing environment.

This project will attempt to get the IPAQ to connect with an X server that is running remotely on a reliable computing base. The IPAQ  will use a wireless medium and the VNC application to interface with this X server. In essence, the IPAQ will be turned in to a mobile network terminal with graphical capabilities. This will allow users to access their computing environments from remote locations using this PDAs.

The purpose of this set-up is not to provide the IPAQ with the complete functionality of full desktop machine. Instead it is to allow individuals to access their computing environment for information or to allow them to make small alterations in it. The emphasis is not on functionality but on convienience and look-up. 

The current focus of this project is on the implementation details. This includes altering the VNC application to allow for  efficient realization on the IPAQ, and how to best grant the user of this application an overview of his computing environment.

Continuously Available and Scalable Sorted List Data Structure
Kiran Nagaraja, Richard Martin, and Thu Nguyen
Rutgers University

The emergence of the Internet as the global, ubiquitous networking infrastructure is driving a new class of highly scalable Internet services. As these services become a part of our everyday life, users will demand ultra reliability. In this work, we argue that that the complexity of replication, fault tolerance, and consistency necessary to achieve this reliability could and should be hidden behind data structure abstractions such as in-memory sorted lists, spatial trees, and graphs.

We believe that the data structure should be flexible, resilient to structural imbalances and allow high concurrency in order to provide the above said services in an efficient manner. We are investigating a distributed sorted list structure sorted on the value field of a 2-tuple (keyid, value); potential applications include multi-item auctions and web ranking/indexing services.

The data structure is organized in a two level hierarchy, a globally replicated 'Splitter' that partitions the value-range among participating nodes and a set of B-trees to hold the data local to each node. Our list implementation allows for high concurrency and ease of management. We are currently investigating data rebalancing among B-trees of a node as well as across nodes.

 
 

 

Last Update: 03/06/2001