DEPARTMENT OF COMPUTER SCIENCE
DOCTORAL DISSERTATION DEFENSE


Candidate: Russell Power
Advisor: Jinyang Li

Building Efficient Distributed In-memory Systems

Tuesday, February 4, 2014. 2:30 PM
Warren Weaver Hall Room 1302


Abstract

The recent cloud computing revolution has changed the distributed computing landscape, making the resources of entire datacenters available to ordinary users. This process has been greatly aided by dataflow style frameworks such as MapReduce which expose simple model for programs, allowing for efficient, fault-tolerant execution across many machines. While the MapReduce model has proved to be effective for many applications, there are a wide class of applications which are difficult to write or inefficient in such a model. This includes many familiar and important applications such as PageRank, matrix factorization and a number of machine learning algorithms. In lieu of a good framework for building these applications, users resort to writing applications using MPI or RPC, a difficult and error-prone construction.

This thesis presents 2 complementary frameworks, Piccolo and Spartan, which help programmers to write in-memory distributed applications not served well by existing approaches.

Piccolo presents a new data-centric programming model for in-memory applications. Unlike data-flow models, Piccolo allows programs running on different machines to share distributed, mutable state via a key-value table interface. This design allows for both high-performance and additional flexibility. Piccolo makes novel use of commutative updates to efficiently resolve write-write conflicts. We find Piccolo provides an efficient backend for a wide-range of applications: from PageRank and matrix multiplication to web-crawling.

While Piccolo provides an efficient backend for distributed computation, it can still be some- what cumbersome to write programs using it directly. To address this, we created Spartan. Spartan implements a distributed implementation of the NumPy array language, and fully sup- ports important array language features such as spatial indexing (slicing), fancy indexing and broadcasting. A key feature of Spartan is its use of a small number of simple, powerful high-level operators to provide most functionality. Not only do these operators dramatically simplify the design and implementation of Spartan, they also allow users to implement new functionality with ease.

We evaluate Piccolo and Spartan on a wide range of applications and find that they both perform significantly better than existing approaches.