Graduate Special Topics in Computer Science

NOTE: for descriptions of standard graduate computer science courses, see Graduate Course Descriptions.

#### G22.3033-001 Computational Systems Biology

Presently, there is no clear way to determine if the current body of biological facts is sufficient to explain phenomenology. In the biological community, it is not uncommon to assume certain biological problems to have achieved a cognitive finality without rigorous justification. In these particular cases, rigorous mathematical models with automated tools for reasoning, simulation, and computation can be of enormous help to uncover cognitive flaws, qualitative simplification or overly generalized assumptions. Some ideal candidates for such study would include: prion hypothesis, cell cycle machinery (DNA replication and repair, chromosome segregation, cell-cycle period control, spindle pole duplication, etc.), muscle contractility, processes involved in cancer (cell cycle regulation, angiogenesis, DNA repair, apoptosis, cellular senescence, tissue space modeling enzymes, etc.), signal transduction pathways, circadian rhythms (especially the effect of small molecular concentration on its robustness), and many others. We believe that the difficulty of biological modeling will become acute as biologists prepare to understand even more complex systems.

The goal of this course is to understand, design and create a large-scale computational system centered on the biology of individual cells, population of cells, intra-cellular processes, and realistic simulation and visualization of these processes at multiple spatio-temporal scales. Such a reasoning system, in the hands of a working biologist, can then be used to gain insight into the underlying biology, design refutable biological experiments, and ultimately, discover intervention schemes to suitably modify the biological processes for therapeutic purposes.

(1) Evolutionary Biology (2) Computability in Biology (3) Reconstructibility in Biology (4) Biology of Cancer (5) Biology of Aging

#### G22.3033-002 Applied Math for Algorithms

Although many mathematics courses have content that is useful in the design and analysis of algorithms, the more applicable material often comprises no more than 10% of the curriculum.

This course will endeavor to present that critical 10% as culled from a handful of math courses and other sources. We will also endeavor to show how this material is applied in the design or analysis of algorithms.

Some of the subjects that will be covered are:

Performance-related equations and their direct solution.

Recurrence equations: first order difference equations and beyond. Eigenfunction methods, multiplicity of eigenvalues, etc.

ODEs and their use in performance analyses. A limited presentation of limit theorems.

PDEs and their use in performance analyses. Limited implications of limit theorems.

Probability, and what properly educated theoreticians might wish to know. Conditional expectations are a powerful highly expressive tool with a foundation that is based on measure theory yet the concept allows us to present proofs and to acquire understandings about probabilistic processes that would otherwise be difficult to formalize. It is also worth understanding what the central limit theorem says, what it means, what it implies, and what it does not imply. Likewise, it is useful to understand the more basic probability distributions and the physical processes that they model.

Statistics is a little different from probability. One of the areas of statistical analysis that has had significant impact in TCS is the theory of Hoeffding-Chernoff bounds. It is probably a good idea to know how this material applies to more than Bernoulli Trials, and to stopping times in particular. We might also cover some issues in the computation of random variables, and at least comment on the importance of extreme points in the space of probability distributions (which is to say that we will survey majorization results).

Complex variables give valuable insights about the solution to many algorithms-based difference equations, and provide the theory necessary for computing saddle point estimates and understanding some of the limit results for PDEs.

Combinatorics is less systematic than analysis, but gives a rich set of techniques that are as varied as tableau methods (used to count seemingly complicated outcomes) and the probabilistic method (which is arguably more combinatorial that probabilistic).

Additional topics will be included as time/opportunities permit.

#### G22.3033-003 Data Mining

We live in the Age of Information. The importance of collecting data that reflects a business or scientific activity to achieve competitive advantage is widely recognized now. Advanced systems for collecting data and managing it in large databases are in place in most large and mid-range companies. However, the bottleneck of turning this data into your success is the difficulty of extracting knowledge about the system from the collected data.

What goods should be promoted to this customer? What is the probability that a certain customer will respond to a planned promotion? Can one predict the most profitable securities to buy/sell during the next trading session? Will this customer default on a loan or pay back on schedule? What medical diagnosis should be assigned to this patient? How large are the peak loads of a telephone or energy network going to be? Why does the manufacturing facility suddenly start to produce defective goods?

These are all the questions that can be answered if information hidden in a database can be found explicitly and utilized. Modeling the investigated system and discovering relations that connect variables are the subject of data mining.

The course will introduce concepts and techniques of data mining and data warehousing, including concept, principle, architecture, design, implementation, application of data warehousing and data mining. TOPICS: Data warehousing and OLAP technology for data mining Data preprocessing Descriptive data mining: characterization and comparison Association analysis Classification and prediction Cluster analysis Mining complex types of data Applications and trends in data mining

#### G22.3033-004 Web Development with Ruby on Rails

This course begins with an in-depth examination of the Ruby language and moves on to web development within the Ruby on Rails framework. An emphasis is placed on understanding the particular features of the Ruby language, how the language compares to others like Java and Python, and how it facilitates the creation of frameworks such as Ruby on Rails. This course is recommended for students with a strong interest in programming languages, web development frameworks, and software engineering. No experience with Ruby or Ruby on Rails is assumed.

#### G22.3033-005 Game Theory, Learning & Planning

There have been many instances in the recent literature in which algorithmic thinking is combined with game-theoretic, or, more specifically, socio-economic concepts to address problems arising in diverse contexts. The purpose of this course is to seize this moment and promote this interaction to create a field of Social Operating Systems (SOS).'' The emphasis will be on mathematically sophisticated techniques in the interface between algorithms, learning, optimization and game theory, as well as their applications to planning. Our primary motivation comes from our work on PLAN C (Planning with Large Agent network against Catastrophes; http://www.bioinformatics.nyu.edu/Projects/planc/index.shtml), which provides emergency management and response plans in face of man-made or natural disasters.

Topics will include some of the following (the list will crystallize gradually after initial discussions in the first class): 1: Game Theory, Nash equilibrium and General Equilibrium 2: Evolutionary game theory and Repeated Games 3: Bounded rationality: Example of Santa Fe Bar Problems 4: Social choice theory: Condorcet, Sen & Arrow 5: Elections, Technorati-Tags and Internet 6: Mechanism design 7: Multiple Objective Functions and Pareto Optimality 8: Multi-Objective Optimization Problems (MOOP) 9: Social Software: Planning (Catastrophe Response) 10: Modeling Social Networks & Bounded Rationality 11: Using Social Networks to Survey, Sample and Search 12: Collaborative Filtering: Recommender Design 13: Collaborative Filtering: Search Engines 14: Collaborative Filtering: Population Sampling and Surveys

#### G22.3033-006 Computational Photography

Course description: Computational Photography is an exciting new area at the intersection of Computer Graphics and Computer Vision. Through the use of computation, its goal is to move beyond the limitations of conventional photography to produce enhanced and novel imagery of the world around us. The main focus of the course will be on software-based methods for producing visually compelling pictures. However, it will also cover novel camera designs, for which computation is integral to their operation. The course will explain the principles behind many of the advanced tools that can be found in Adobe Photoshop, although the use of this package itself is outside the scope of the course. The course will be suitable for advanced undergraduates, masters and PhD students. A reasonable knowledge of linear algebra is required and familiarity with Matlab is desirable. Assessment will be through coursework and a course project.

#### G22.3033-007 Probabilistically Checkable Proofs and Hardness of Approximation

Many optimization problems of theoretical and practical interest are NP-complete, meaning it is impossible to compute exact solutions to these problems in polynomial time unless P = NP. A natural way to cope with this curse of NP-completeness is to seek approximate solutions instead of exact solutions. An algorithm with approximation ratio C computes, for every problem instance, a solution whose cost is within a factor C of the optimum. Optimization problems exhibit a wide range of behavior in their approximability. It is well-known that Bin-Packing has an approximation algorithm with ratio 1+\epsilon for every \epsilon > 0. In theory jargon, we say that Bin-Packing has a polynomial-time approximation scheme (PTAS). However, it wasn't known till the early 90s whether problems like MAX-3SAT, Vertex Cover, and MAX-CUT have a PTAS. A celebrated result called the PCP Theorem finally showed that these problems have no PTAS unless P = NP. Such results that rule out the possibility of good approximation algorithms (under complexity theoretic assumptions like P != NP) are called inapproximability results or hardness of approximation results.

The PCP Theorem has an equivalent formulation from the point of view of proof checking. The PCP theorem states that every NP-statement has a probabilistically checkable proof, i.e. a proof which can be "spot-checked" by reading only a constant number of bits from the proof. These bits are selected by a randomized process using a very limited amount of randomness. The checking process always accepts a correct proof of a correct statement and rejects any cheating proof of an incorrect statement with high probability. The term "holographic proof" is sometimes used to highlight this feature that a cheating proof must be wrong everywhere and therefore, can be detected by a spot-check. The discovery of the connection between proof checking and inapproximability results is one of the most exciting theoretical developments in the last decade. Since then, PCPs have led to several breakthrough results in inapproximability theory, e.g. tight hardness results for Clique, MAX-3SAT, and Set Cover.

This course will cover many of the inapproximability results and PCPs used to prove them. No prior knowledge will be assumed, except the basic theory of NP-completeness. Participants are expected to scribe notes for one lecture and/or give one presentation. No assignments/exams!