Instructions for submitting a technical report or thesis.
Title: Genomics via Optical Mapping II(A): Restriction Maps from Partial Molecules and Variations
Author(s): Anantharaman, T.; Mishra, B.
Abstract:
In this paper, we extend an algorithmic approach to constructing ordered restriction maps from images of a population of individual DNA molecules (clones) digested by restriction enzymes. The original algorithm was capable of producing high-resolution, high-accuracy maps rapidly and in a scalable manner given a certain class of data errors, including contamination, sizing errors, false and missing restriction sites and unknown orientation. Here we extend this set of errors to include possibly broken molecules where the amount of breakage is not known beforehand, which is necessary for handling larger clones. In an earlier paper~\cite{optmapII}, we had shown that the problem of making maps from molecules with end fragments missing as the only source of error is NP-complete. We also show how to handle multiple reliability levels in the input data when calling restriction sites, where the actual reliability levels are not known and must be infered from the data.
Title: Genomics via Optical Mapping (I): Probabilistic Analysis of Optical Mapping Models
Author(s): Anantharaman, T.; Mishra, B.
Abstract:
We study several simple models for optical mapping and explore their power and limitations when applied to the construction of maps of clones (e.g., lambdas, cosmids, BACs and YACs). We provide precise lower and upper bounds on the number of clone molecules needed to create the correct map of the clone. Our probabilistic analysis shows that as the number of clone molecules is increased in the optical mapping data, the probability of successful computation of the map jumps from 0 to 1 for fairly small number of molecules (for typical values of the parameterS, the transition point is around 70 molecules). These observations have been independently verified with extensive tests, with both in vitro and in silico data.
In addition, we compare our results with those derived by Karp and Shamir in a recent paper. We hope that this paper clarifies certain misconceptions and explains why the model proposed in Anantharaman et al. (1997) has proven so powerful.
Title: Genomics via Optical Mapping III: Contiging Genomic DNA and Variations
Author(s): Anantharaman, T.; Mishra, B.; Schwartz, D.
Abstract:
In this paper, we describe our algorithmic approach to constructing an alignment (Contig) of a set of optical maps created from the images of individual genomic DNA molecules digested by restriction enzymes. Generally, these DNA segments are sized in the range of 1--4 Mb. The problem of assembling clone contig maps is a simpler special case of this contig problem and is handled by our algorithms. The goal is to devise contiging algorithms capable of producing high-quality composite maps rapidly and in a scalable manner. The resulting software is a key component of our physical mapping automation tools and has been used routinely to create composite maps of various microorganisms (E.coli, P.falciparum and D.radioduran). The experimental results appear highly promising.
Title: An Efficient Primal-Dual Interior-Point Method for Minimizing a Sum of Euclidean Norms
Author(s): Anderson, K. D.; Christiansen, E.; Conn, A. R.; Overton, M. L.
Abstract:
The problem of minimizing a sum of Euclidean norms dates from the 17th century and may be the earliest example of duality in the mathematical programming literature. This nonsmooth optimization problem arises in many different kinds of modern scientific applications. We derive a primal-dual interior-point algorithm for the problem, by applying Newton's method directly to a system of nonlinear equations characterizing primal and dual feasibility and a perturbed complementarity condition. The main work at each step consists of solving a system of linear equations (the Schur complement equations). This Schur complement matrix is not symmetric, unlike in linear programming. We incorporate a Mehrotra-type predictor-corrector scheme and present some experimental results comparing several variations of the algorithm, including, as one option, explicit symmetrization of the Schur complement with a skew corrector term. We also present results obtained from a code implemented to solve large sparse problems, using a symmetrized Schur complement. This has been applied to problems arising in plastic collapse analysis, with hundreds of thousands of variables and millions of nonzeros in the constraint matrix. The algorithm typically finds accurate solutions in less than 50 iterations and determines physically meaningful solutions previously unobtainable.
Title: Just-in-Time Transparent Resource Management
Author(s): Baratloo, A.
Abstract:
This paper presents the design and the implementation of a resource management system for monitoring computing resources on a network and for dynamically allocating them to concurrently executing jobs. In particular, it is designed to support adaptive parallel computations---computations that benefit from addition of new machines, and can tolerate removal of machines while executing. The challenge for such a resource manager is to communicate the availability of resources to running programs even when the programs were not developed to work with external resource managers. Our main contribution is a novel mechanism addressing this issue, built on low-level features common to popular parallel programming systems.
Existing resource management systems for adaptive computations either require tight integration with the operating system (DRMS), or require an integration with a programming system that is aware of external resource managers (e.g. Condor/CARMI, MPVM, Piranha). Thus in each case, their support is limited to a single type of programming system. In contrast, our resource management system is unique in supporting several unmodified parallel programming systems. Furthermore, the system runs with user-level privilege, and thus can not compromise the security of the network.
The underlying mechanism and the overall system have been validated on a dynamically changing mix of jobs, some sequential, some PVM, some MPI, and some Calypso computations. We demonstrate the feasibility and the usefulness of our approach, thus showing how to construct a middleware resource management system to enhance the utilizations of distributed systems.
Title: Foveation Techniques and Scheduling Issues in Thinwire Visualization
Candidate: Chang, Ee-Chien
Advisor(s): Yap, Chee
Abstract:
We are interested in the visualization of large images across a network. Upon request, the server sends an image across the network to the client, who in turn, presents this image to the viewer. A key observation is that, at any moment, the viewer is mainly interested in a region around his gaze point in the image. To exploit this, we let the viewer interactively indicates this point and the selected region will have higher priority in the transmission process. As a result, the displayed image is a ``space-variant'' image. A fundamental difference between this scheme and the usual progressive transmission scheme is that we place more emphasis on the visualization process. This shift in emphasis opens up new perspectives on the problem. In this thesis, we focus on this difference.
In chapter two, we formalize the operation of ``foveating an image'', study how to distribute the resolution over an image, and how to progressively refine such a space-variant image. Motivated by properties of human vision, we propose two methods for the construction of space-variant images. In chapter three, we formulate and study an abstract on-line scheduling problem which is motivated by interactions between the client and the server. In the fourth and last chapter, we describe details and issues in an implementation.
Title: Exploiting Application Tunability for Efficient, Predictable, Parallel Resource Management
Author(s): Chang, F.; Karamcheti, V.; Kedem, Z.
Abstract:
Parallel computing is becoming increasing central and mainstream, driven both by the widespread availability of commodity SMP and high-performance cluster platforms, as well as the growing use of parallelism in general-purpose applications such as image recognition, virtual reality, and media processing. In addition to performance requirements, the latter computations impose soft real-time constraints, necessitating em efficient, predictable parallel resource management. Unfortunately, traditional resource management approaches in both parallel and real-time systems are inadequate for meeting this objective; the parallel approaches focus primarily on improving application performance and/or system utilization at the cost of arbitrarily delaying a given application, while the real-time approaches are overly conservative sacrificing system utilization in order to meet application deadlines. In this paper, we propose a novel approach for increasing parallel system utilization while meeting application soft real-time deadlines. Our approach exploits the application tunability found in several general-purpose computations. Tunability refers to an application's ability to trade off resource requirements over time, while maintaining a desired level of output quality. In other words, a large allocation of resources in one stage of the computation's lifetime may compensate, in a parameterizable manner, for a smaller allocation in another stage. We first describe language extensions to support tunability in the Calypso programming system, a component of the MILAN metacomputing project, and evaluate their expressiveness using an image processing application. We then characterize the performance benefits of tunability, using a synthetic task system to systematically identify its benefits and shortcomings. Our results are very encouraging: application tunability is convenient to express, and can significantly improve parallel system utilization for computations with predictability requirements.
Title: Techniques to Improve the Performance of Software-based Distributed Shared Memory Systems
Candidate: Chu, Churngwei
Advisor(s): Kedem, Zvi
Abstract:
Software distributed shared memory systems are able to provide programmers with the illusion of global shared memory on networked workstations without special hardware support. This thesis identifies two problems in contemporary software distributed shared memory systems: (1) poor application programming interfaces for programmers who need to solve complicated synchronization problems and (2) inefficiencies in traditional multiple writer protocols. We propose a solution to both of these problems. One is the introduction of user-definable high level synchronization primitives to provide a better application programming interface. The other is the single-owner protocol to provide efficiency. In order to accommodate user-definable high level synchronization primitives, a variant of release consistency is also proposed.
User-definable high level synchronization primitives provide a paradigm for users to define their own synchronization primitives instead of relying on traditional low level synchronization primitives, such as barriers and locks. The single-owner protocol reduces the number of messages from O ( n ^{ 2 } ) messages (the number of messages needed in the multiple-owner protocol) to Theta(n) messages when there are first n writers writing to a page and then n readers reading the page. Unlike some multiple-owner protocols, in the single-owner protocol garbage collection is performed asynchronously, and the size of a message for doing memory update is smaller in most cases.
We also evaluate the tradeoffs between the single-owner protocol and multiple-owner protocols. We have found that in most cases the single-owner protocol uses fewer messages than multiple-owner protocols, but there are some computations which may perform better with some multiple-owner protocols. In order to combine the advantages of both protocols, we propose a hybrid owner protocol which can be used to increase the efficiency in an adaptive way, with some pages managed by the single-owner protocol and some by a multiple-owner protocol.
Finally, five applications are evaluated using the single-owner protocol and a particular multiple-owner protocol called the lazy invalidate protocol. The performance of these two protocols is compared. We also demonstrate the use of user-definable high level synchronization primitives on one of the applications, and compare its performance against the same application constructed using only low-level synchronization primitives.
Title: Deformable Object Tabula Rasa: A Zoomable User Interface System
Candidate: Fox, David
Advisor(s): Perlin, Ken
Abstract:
This dissertation develops the concept of a zoomable user interface and identifies the design elements which are important to its viability as a successor to the desktop style of interface. The implementation of an example system named Tabula Rasa is described, along with the design and implementation of some sample applications for Tabula Rasa. We show how programming techniques such as delegation and multi-methods can be used to solve certain problems that arise in the implementation of Tabula Rasa, and in the implementation of Tabula Rasa applications.
Over the past thirty years the desktop or WIMP (Windows, Icons, Menus, Pointer) user interface has made the computer into a tool that allows non-specialists to get a variety of tasks done. In recent years, however, the applications available under this interface have become larger and more unwieldy, taking into themselves more and more marginally related functionality. Any inter-operability between applications must be explicitly designed in.
The Zoomable User Interface (ZUI) is a relatively new metaphor designed as a successor to the desktop interface. It is inspired by the Pad system, which is based on a zoomable surface of unlimited resolution. Just as the desktop interface has a set of essential elements, a ZUI has a set of elements each of which is vital to the whole. These include
These basic elements combine to produce an environment that takes advantage of the user's spatial memory to create a more expansive and dynamic working environment, as well as encouraging finer grained applications that automatically inter-operate with various types of data objects and applications.
Title: A New Solution to the Hidden Copy Problem
Author(s): Goyal, D.; Paige, R.
Abstract:
We consider the well-known problem of avoiding unnecessary costly copying that arises in languages with copy/value semantics and large aggregate structures such as arrays, sets, or files. The origins of many recent studies focusing on avoiding copies of flat arrays in functional languages may be traced back to SETL copy optimization [Schwartz 75]. The problem is hard, and progress is slow, but a successful solution is crucial to achieving a pointer-free style of programming envisioned by [Hoare 75].
We give a new solution to copy optimization that uses dynamic reference counts and lazy copying to implement updates efficiently in an imperative language with arbitrarily nested finite sets and maps (which can easily model arrays, records and other aggregate datatypes). Big step operational semantics and abstract interpretations are used to prove the soundness of the analysis and the correctness of the transformation. An efficient algorithm to implement the analysis is presented. The approach is supported by realistic empirical evidence.
Our solution anticipates the introduction of arbitrarily nested polymorphic sets and maps into JAVA. It may also provide a new efficient strategy for implementing object cloning in Java and object assigment in C++. We illustrate how our methods might improve the recent approach of [Wand and Clinger 98] to avoid copies of flat arrays in a language of first-order recursion equations.
Title: Competitive Equilibrium
Author(s): Greenwald, A.
Abstract:
This report includes a modern account of welfare economics and competitive equilibrium theory. In particular, competitive, or Walrasian, equilibrium is defined. Moreover, existence, optimality, and uniqueness are demonstrated. However, no reliable mechanism for computing equilibrium prices is suggested. At this stage, the problem shifts from the realm of economics to an algorithmic problem in computer science.
Title: Learning to Play Network Games
Author(s): Greenwald, A.
Abstract:
The idea of learning to play equilibrium strategies in repeated games is an active area of research in the game-theoretic community. Game theorists are primarily concerned with the equilibrium outcomes of learning algorithms in the limit: i.e., over an infinite amount of time. One of the goals of this research is to apply computer science ideology to learning theory. In particular, this thesis will consider imposing restrictions on traditional game-theoretic learning algorithms such that players learn to play approximations to equilibrium strategies in bounded amounts of time. The idea of such bounded learning algorithms is to quickly learn to exploit the obvious, while ignoring any subtleties.
The idea of bounded learning is applicable to network games, in which players learn to utilize networks during times of minimal congestion. These games are atypical as compared with traditional games described in the game-theoretic literature, since their underlying structure is not commonly understood by the players, and moreover, common knowledge of rationality is not a valid assumption. As such, this class of repeated games does not naturally lend itself to belief-based learning algorithms. Rather, this thesis will investigate learning algorithms for network games that are analyzed on the basis of performance, without requiring that players maintain prior beliefs about expected network congestion. In sum, the initial focus of this thesis is to explore an application of computer science ideology to learning algorithms in game theory; secondly, bounded game-theoretic learning will be applied to routing and congestion problems in network environments.
Title: Metacomputing and Resource Allocation on the World Wide Web
Candidate: Karaul, Mehmet
Advisor(s): Kedem, Zvi
Abstract:
The World Wide Web is a challenging environment for distributed computing due to its sheer size and the heterogeneity and unreliability of machines and networks. Therefore, scalability, load balancing, and fault masking play an important role for Web-based systems. In this dissertation, I present novel mechanisms for resource allocation and parallel computing on the Web addressing these issues.
Large Web sites rely on a set of geographically dispersed replicated servers among which client requests should be appropriately allocated. I present a scalable decentralized design, which pushes the allocation functionality onto the clients. At its core lies a pricing strategy that provides incentives to clients to control the dispatching of requests while still allowing clients to take advantage of geographic proximity. An adaptive algorithm updates prices to deal with dynamic changes. A prototype system based on this architecture has been implemented and its functionality validated through a series of experiments.
Parallel computing on local area networks is based on a variety of mechanisms targeting the properties of this environment. However, these mechanisms do not effectively extend to wide area networks due to issues such as heterogeneity, security, and administrative boundaries. I present a prototype system which allows application programmers to write parallel programs in Java and allows Java-capable browsers to execute parallel tasks. It comprises a virtual machine model which isolates the program from the execution environment, and a runtime system realizing this machine on the Web. Load balancing and fault masking are transparently provided by the runtime system.
Title: Free Parallel Data Mining
Candidate: Li, Bin
Advisor(s): Shasha, Dennis
Abstract:
Data mining is the emerging field of applying statistical and artificial intelligence techniques to the problem of finding novel, useful, and non-trivial patterns from large databases. This thesis presents a framework for easily and efficiently parallelizing data mining algorithms. We propose an acyclic directed graph structure, exploration dag ( E-dag ), to characterize the computation model of data mining algorithms in classification rule mining, association rule mining, and combinatorial pattern discovery. An E-dag can be constructively formed in parallel from specifications of a data mining problem, then a parallel E-dag traversal is performed on the fly to efficiently solve the problem. The effectiveness of the E-dag framework is demonstrated in biological pattern discovery applications.
We also explore data parallelism in data mining applications. The cross-validation and the windowing techniques used in classification tree algorithms facilitate easy development of efficient data partitioning programs. In this spirit, we present a new classification tree algorithm called NyuMiner that guarantees that every split in a classification tree is optimal with respect to any given impurity function and any given maximum number of branches allowed in a split. NyuMiner can be easily parallelized using the data partitioning technique.
This thesis also presents a software architecture for running parallel data mining programs on networks of workstations (NOW) in a fault-tolerant manner. The software architecture is based on Persistent Linda (PLinda), a robust distributed parallel computing system which automatically utilize idle cycles. Templates are provided for application programmers to develop parallel data mining programs in PLinda. Parallelization frameworks and the software architecture form a synergy that makes free efficient data mining realistic.
Title: Fast Algorithms for Discovering the Maximum Frequent Set
Candidate: Lin, Dao-I
Advisor(s): Kedem, Zvi
Abstract:
Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and the top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, thus specifying immediately all frequent itemsets. We evaluate the performance of the algorithm using well-known synthetatic benchmark databases and real-life census and stock market databases. The improvement in performance can be up to several orders of magnitude, compared to the best current algorithms.
Title: Algorithmic Techniques in Computational Genomics
Candidate: Parida, Laxmi
Advisor(s): Mishra, Bud
Abstract:
This thesis explores the application of algorithmic techniques in understanding and solving computational problems arising in Genomics (called Computational Genomics ). In the first part of the thesis we focus on the problem of reconstructing physical maps from data, related to "reading" the genome of an organism, and in the second part we focus on problems related to "interpreting" (in a very limited sense) the genome. The main contributions of the thesis are understanding the computational complexity of, and designing algorithms for some key problems in both these domains.
The primary goal of the Human Genome Project is to determine the entire three billion base pair sequence of the human genome and locate roughly 100,000 genes on the DNA. Recently, a set of single molecule methods (such as optical mapping) have been developed that allow one to create physical maps (a set of landmarks on the DNA whose locations are well defined), but can only do so by combining a population of data in the presence of errors from various sources. In the first part of the thesis, we focus on the problem of computing physical maps from data that arise in single molecule methods. We describe two combinatorial models of the problem termed Exclusive Binary Flip Cut (EBFC) and Weighted Consistency Graph (WCG) problems. We show that both the problems are MAX SNP hard and give bounds on the approximation factors achievable. We give polynomial time 0.878-approximation algorithm for the EBFC problem and 0.817-approximation algorithm for the WCG problem, using the maxcut approximation algorithm due to Goemans and Williamson. We also give a low polynomial time practical algorithm that works well on simulated and real data. Naksha is an implementation of this algorithm and a demonstration is available at http://www.cs.nyu.edu/parida/naksha.html . We also have similar results on complexity for generalizations of the problem which model various other sources of errors. We have generalized our complexity and algorithmic results to the case where there is more than one population in the data (which we call the K -populations problem). In the second part of the thesis, we focus on "interpreting" the genome. We consider the problem of discovering patterns (or motifs) in strings on a finite alphabet: we show that by appropriately defining irredundant motifs, the number of irredundant motifs is only quadratic in the input size. We use these irredundant motifs in designing algorithms to align multiple genome or protein sequences. Alignment of sequences aids in comparing similarities, in structure and function of the proteins.
Title: Thinksheet: a Tool for Information Navigation
Candidate: Piatko, Peter
Advisor(s): Shasha, Dennis
Abstract:
Imagine that you are a ``knowledge worker'' in the coming millenium. You must synthesize information and make decisions such as ``Which benefits plan to use?'' ``What do the regulations say about this course of action?'' ``How does my job fit into the corporate business plan?'' or even ``How does this program work?'' If the dream of digital libraries is to bring you all material relevant to your task, you may find yourself drowning before long. Reading is harder than talking to people who know the relevant documents and can tell you what you're interested in. That is what many current knowledge workers do, giving rise to professions such as insurance consultant, lawyer, benefits specialist, and so on.
Imagine by contrast that the documents you retrieve could be tailored precisely to your needs. That is, imagine that the document might ask you questions and produce a document filtered and organized according to those you have answered.
We have been developing software that allows writers to tailor documents to the specific needs of large groups of readers. Thinksheet combines the technologies of expert systems, spreadsheets, and database query processing to provide tailoring capabilities for complex documents. The authoring model is only slighly more complex than a spreadsheet.
This thesis discusses the conceptual model and the implementation of Thinksheet, and applications for complex documents and metadata.
Title: Steering Clear of Triples: Deriving the Control Flow Graph Directly from the Abstract Syntax Tree in C Programs
Author(s): Schwartz, N.
Abstract:
This article explores the extension of Morgenthaler's Virtual Control Flow technique, which derives control flow semantics directly from the Abstract Syntax Tree, from the relatively coarse granularity of syntactic C expressions to the finer granularity of basic block expressions, that is, expressions without embedded control flow. We explain why this is a better level of abstraction for program analysis, and discuss the elements of an efficient and elegant solution, motivating the presentation by appealing to a more explicit intermediate form. We present our algorithm, and conclude with remarks about the suitability of Morgenthaler's version of Virtual Control Flow for customary exhaustive data-flow analysis.
Title: Corpus-based Parsing and Sublanguage Studies
Candidate: Sekine, Satoshi
Advisor(s): Grishman, Ralph
Abstract:
There are two main topics in this thesis, a corpus-based parser and a study of sublanguage.
A novel approach to corpus-based parsing is proposed. In this framework, a probabilistic grammar is constructed whose rules are partial trees from a syntactically-bracketed corpus. The distinctive feature is that the partial trees are multi-layered. In other words, only a small number of non-terminals are used to cut the initial trees; other grammatical nodes are embedded into the partial trees, and hence into the grammar rules. Good parsing performance was obtained, even with small training corpora. Several techniques were developed to improve the parser's accuracy, including in particular two methods for incorporating lexical information. One method uses probabilities of binary lexical dependencies; the other directly lexicalizes the grammar rules. Because the grammar rules are long, the number of rules is huge - more than thirty thousand from a corpus of one million words. A parsing algorithm which can efficiently handle such a large grammar is described. A Japanese parser based on the same idea was also developed.
Corpus-based sublanguage studies were conducted to relate the notion of sublanguage to lexical and syntactic properties of a text. A statistical method based on word frequencies was developed to define sublanguages within a collection of documents; this method was evaluated by identifying the sublanguage of new documents. Relative frequencies of different syntactic structures were used to assess the domain dependency of syntactic structure in a multi-domain corpus. Cross-entropy measurements showed a clear distinction between fiction and non-fiction domains. Experiments were then performed in which grammars trained on individual domains, or sets of domains, were used to parse texts in the same or other domains. The results correlate with the measurements of syntactic variation across domains; in particular, the best performance is achieved using grammars trained on the same or similar domains.
The parsing and sublanguage techniques were applied to speech recognition. Sublanguage techniques were able to increase recognition accuracy, and some promising cases were found where the parser was able to correct recognition errors.
Title: On the L(2) Stability of the 1-D Mortar Projection
Author(s): Stefanica, D.
Abstract:
It is previously known that the one dimensional mortar finite element projection is stable in the $L^2$ norm, provided that the ratio of any two neighboring mesh intervals is uniformly bounded, but with the constant in the bound depending on the maximum value of that ratio. In this paper, we show that this projection is stable in the $L^2$ norm, independently of the properties of the nonmortar mesh. The 1D trace of the mortar space considered here is a piecewise polynomial space of arbitrary degree; therefore, our result can be used for both the $h$ and the $hp$ version of the mortar finite element.
Title: Poincare and Friedrichs Inequalities For Mortar Finite Element Methods
Author(s): Stefanica, D.
Abstract:
Mortar finite elements are nonconforming finite elements that allow for a geometrically nonconforming decomposition of the computational domain and, at the same time, for the optimal coupling of different variational approximations in different subregions. Poincare and Friedrichs inequalities for mortar finite elements are derived. Using these inequalities, it is shown that the condition number for self-adjoint elliptic problems discretized using mortars is comparable to that of the conforming finite element case. Geometrically non-conforming mortars of the second generation are considered, i.e. no continuity conditions are imposed at the vertices of the subregions.
Title: A Numerical Study of a Class of FETI Preconditioners for Mortar Finite Elements in Two Dimensions
Author(s): Stefanica, D.; Klawonn, A.
Abstract:
The FETI method is an iterative substructuring method using Lagrange multipliers. It is actively used in industrial--size parallel codes for solving difficult computational mechanics problems, for example the system ANSYS. Mortar finite elements are nonconforming finite elements that also allow for a geometrically nonconforming decomposition of the computational domain and for the optimal coupling of different variational approximations in different subdomains. We present a numerical study of three different FETI preconditioners for two dimensional, self-adjoint, elliptic equations discretized by mortar finite elements.
Title: An Iterative Substructuring Method for Maxwell's Equations in Two Dimensions
Author(s): Toselli, A.; Widlund, O. B.; Wohlmuth, B. I.
Abstract:
Iterative substructuring methods, also known as Schur complement methods, form an important family of domain decomposition algorithms. They are preconditioned conjugate gradient methods where solvers on local subregions and a solver on a coarse mesh are used to construct the preconditioner. For conforming finite element approximations of $H^1$, it is known that the number of conjugate gradient steps required to reduce the residual norm by a fixed factor is independent of the number of substructures and that it grows only as the logarithm of the dimension of the local problem associated with an individual substructure. In this paper, the same result is established for similar iterative methods for low--order N{\'e}d{\'e}lec finite elements, which approximate $\Hcurl$ in two dimensions. Results of numerical experiments are also provided.
Title: Some Results on Overlapping Schwarz Methods for the Helmholtz Equation Employing Perfectly Matched Layers
Author(s): Toselli, A.
Abstract:
In this paper, we build a class of overlapping Schwarz preconditioners for a finite element approximation of the Helmholtz equation in two dimensions. Perfectly Matched Layers are employed to build the local problems and two kinds of boundary conditions are employed to match the local solutions. Numerical results are presented to compare the different preconditioners.
Title: Abstract Models of Distributed Memory Management
Candidate: Ungureanu, Cristian
Advisor(s): Goldberg, Benjamin
Abstract:
In this dissertation, we are presenting a model suitable for reasoning about memory management in concurrent and distributed systems. The model provides a suitable level of abstraction: it is low-level enough so that we can express communication, allocation and garbage collection, but otherwise hides many of the lower-level details of an actual implementation. Using it, we can give compact, and provably correct, characterizations of garbage collection algorithms in distributed systems.
The models are rewriting systems whose terms are programs in which the ``code'' and the ``store'' are syntactically apparent. Evaluation is expressed as conditional rewriting and includes store and communication operations. Using techniques developed for communicating and concurrent systems we give a semantics suitable for proving equivalence of such programs. Garbage collection becomes a rewriting relation that removes part of the store without affecting the behavior of the program.
We introduce and prove correct a very general garbage collection rule based on reachability; any actual implementation which is capable of providing the transitions (including their atomicity constraints) specified by the strategy is therefore correct. We give examples of such specific implementations, and show how their correctness follows from the correctness of the general relation.
Title: An Iterative Substructuring Method for Raviart-Thomas Vector Fields in Three Dimensions
Author(s): Wohlmuth, B. I.; Toselli, A.; Widlund, O. B.
Abstract:
The iterative substructuring methods, also known as Schur complement methods, form one of two important families of domain decomposition algorithms. They are based on a partitioning of a given region, on which the partial differential equation is defined, into non-overlapping substructures. The preconditioners of these conjugate gradient methods are then defined in terms of local problems defined on individual substructures and pairs of substructures, and, in addition, a global problem of low dimension. An iterative method of this kind is introduced for the lowest order Raviart-Thomas finite elements in three dimensions and it is shown that the condition number of the relevant operator is independent of the number of substructures and grows only as the square of the logarithm of the number of unknowns associated with an individual substructure. The theoretical bounds are confirmed by a series of numerical experiments.
Title: Finding Idle Work Periods on Networks of Workstations
Author(s): Wyckoff, P.; Jeong, K.; Johnson, T.
Abstract:
We present a simple technique for predicting the probability that an idle workstation will continue to be idle for $i$ minutes, given that it has been idle for $x$ minutes (i.e., find the {\em remaining idle period probability} $P(i;x)$). By idle we mean that the workstation owner is not interactively using the workstation or executing other tasks on it. The results are particularly applicable to the scheduling of tasks in systems that harvest cycles from idle-only workstations. Our Remaining Idle Period Probability Predictor (RIPPP) uses the distribution of the lengths of idle periods on the managed workstations. Collecting, storing, and processing these distributions (in the form of histograms) is a small overhead on modern workstations (a few kilobytes of storage per workstation).
We investigated the behavior of our RIPPP with usage traces of 31 workstations collected over a five month period, and discovered the following six results. (1) The distribution of one month of idle periods predicts the remaining idle period probability in the next month for most workstations. (2) Different workstations tend to have significantly different idle period length distributions. (3) The average length of an idle period does not necessarily correlate well with the probability of being able to find long idle periods, contrary to intuition and previous scheduling heuristics. (4) A workstation that has been idle a long time does not necessarily have a high probability of remaining idle for a long time. (5) Using the time of day can improve predictions. (6) The length of the previous and the current idle periods are positively correlated, but the length of the previous idle period is not strongly correlated with finding long remaining idle periods.
Based on these studies, we conclude that an effective way to find idle workstations is to collect their idle period length distribution and use it to compute $P(i;x)$. We believe our analysis will be applicable to predicting the length of busy periods, which is useful for deciding whether to migrate or suspend tasks when a workstation becomes busy (the owner reclaims it).
From our results, we have developed a remaining idle period probability toolkit which includes a statistics collector and a prediction library in C. This will be available from our project homepage.
Title: Fault-tolerant parallel computing on networks of non-dedicated workstations
Candidate: Wyckoff, Peter
Abstract:
This thesis addresses fault tolerance issues in parallel computing on loosely-coupled networks of non-dedicated, heterogeneous workstations. The efficiency of fault tolerance mechanisms is dictated by network and failure characteristics. Traditional approaches to fault tolerance are efficient when network and failure characteristics are identical across workstations, such as in a local area network of homogeneous workstations; however, a loosely coupled network of non-dedicated workstations has non-uniform network and failure characteristics. This thesis presents the design and implementation of a flexible fault tolerance runtime system that allows each process in a parallel application to use one of three rollback recovery mechanisms. Rollback recovery is achieved using a lightweight form of transaction, which performance results show incurs almost no overhead. The system is built on top of the Linda coordination language and runs on Alpha, Linux, Solaris and SGI workstations and Java-enabled browsers. For barrier synchronous parallel applications, a new equi-distant checkpointing interval selection method, the expected maximum heuristic, is presented. The method is applicable to any rollback recovery system in which processes recover from failure independently and communicate through a reliable third party. Simulation results show that the expected maximum heuristic has near optimal performance under a variety of different failure rates and barrier lengths.