Graduate Special Topics in Computer Science

NOTE: for descriptions of standard graduate computer science courses, see Graduate Course Descriptions.

G22.3033-002 Special Topics Computational Biology: Cell Informatics

Presently, there is no clear way to determine if the current body of biological facts is sufficient to explain phenomenology. In the biological community, it is not uncommon to assume certain biological problems to have achieved a cognitive finality without rigorous justification. In these particular cases, rigorous mathematical models with automated tools for reasoning, simulation, and computation can be of enormous help to uncover cognitive flaws, qualitative simplification or overly generalized assumptions. Some ideal candidates for such study would include: prion hypothesis, cell cycle machinery (DNA replication and repair, chromosome segregation, cell-cycle period control, spindle pole duplication, etc.), muscle contractility, processes involved in cancer (cell cycle regulation, angiogenesis, DNA repair, apoptosis, cellular senescence, tissue space modeling enzymes, etc.), signal transduction pathways, circadian rhythms (especially the effect of small molecular concentration on its robustness), and many others. We believe that the difficulty of biological modeling will become acute as biologists prepare to understand even more complex systems.

Fortunately, in the past, similar issues had been faced by other disciplines: for instance, design of complex microprocessors involving many millions of transistors, building and controlling a configurable robots involving very high degree-of-freedom actuators, implementing hybrid controllers for high-way traffic or air-traffic, or even reasoning about data traffic on a computer network. The approaches developed by control theorists analyzing stability of a system with feedback, physicists studying asymptotic properties of dynamical systems, computer scientists reasoning about a discrete or hybrid (combining discrete events with continuous events) reactive systems---all have tried to address some aspects of the same problem in a very concrete manner. We believe that biological processes could be studied in a similar manner, once the appropriate tools are made available.

The goal of this course is to understand, design and create a large-scale computational system centered on the biology of individual cells, population of cells, intra-cellular processes, and realistic simulation and visualization of these processes at multiple spatio-temporal scales. Such a reasoning system, in the hands of a working biologist, can then be used to gain insight into the underlying biology, design refutable biological experiments, and ultimately, discover intervention schemes to suitably modify the biological processes for therapeutic purposes. The course will focus primarily on two biological processes: genome-evolution and cell-to-cell communication.

G22.3033.03 Internet & Intranet Protocols & Applications

Internet and Intranet Protocols and Applications studies the world's most widely used application level network protocols and software systems.

We study protocols, such as HTTP for the Web, SMTP and POP3 for email, FTP for file transfer, and SSL for security. We consider protocol design issues, especially as they influence functionality, reliability and performance. We carefully read protocol specifications, such as the HTTP specification, RFC 2068. We study the systems which use these protocols, clients and servers. We also study intermediate systems which enhance performance, such as caching proxies and content delivery services. We will examine complex functionality and performance issues, such as time-out management and high-performance concurrent servers.

Programming assignments ask students to write clients and servers to the sockets interface. Students are expected to have taken Data Communications and Networks or equivalent. Students will write several small programming assignments and one large project. The large programming project will ask students to design and implement a load balancing manager as used by content serving companies such as Akamai and Sandpiper.

Guest lecturers will present current research and practice on some of the following issues: the design and operation of an Internet EDI Service, the design and operation of a high volume Web-based branding system, performance issues in WWW servers, and Internet security.

The last quarter of the course examines research that enhances internet and Web performance.

G22.3033-005 The Design and Programming of Embedded Systems

Prerequisites: Programming Languages (G22.2110), Compilers (G22.2130)

The vast majority of computers today are not general-purpose desktop or laptop machines, they are embedded as components of other electronic devices - cell phones, microwave ovens, automobiles, etc. Often, the primary concern when designing and programming these embedded systems is not speed of execution, but rather power consumption, memory requirements, and reliability. In this course, we will discuss the issues faced by embedded system designers, both at the hardware and software levels. In addition, there will be programming assignments for microprocessors commonly used in embedded systems.

G22.3033-006 Advanced Object-Oriented Techniques

Prerequisites: G22.2110 and basic familiarity with Java (or another object-oriented language).

The goal of this course is to familiarize students with several advanced object-oriented techniques that are currently widely used in industry. After a brief review of object-oriented terminology (subtyping, dynamic dispatch, inheritance, delegation, etc.), the following topics will be presented in detail:

  • UML diagrams, and how to use for designing object-oriented programs
  • design patterns: an overview of design idioms are useful for creating flexible and extensible designs
  • techniques for testing of object-oriented programs
  • performance analysis of object-oriented programs
  • refactoring: techniques for restructuring programs in order to accommodate changed requirements

The objective of the course is to make students sufficiently proficient with the use of these techniques so that they can apply them in practice. To achieve this goal, the course has a substantial practical component, in the form of a series of programming assignments that are performed in groups.

The following is a very preliminary outline of the course. Please be aware that the following may be subject to change.

Overview of course and project:

  • Review of object-oriented terminology, concepts, and of object-oriented language constructs in Java.
  • Introduction to UML. Overview of the 9 types of diagrams. Use cases, use case diagrams classes, attributes, operations class diagrams relationships: associations, generalization, aggregation, and composition.
  • Relationships (in detail), association, generalization, multiplicities, navigability, notes, stereotypes, constraints, interfaces, realization, roles, package diagrams, and object diagrams.
  • Interaction diagrams, sequence diagrams, collaboration diagrams, modeling events, signals, and exceptions, activity diagrams, statechart diagrams, component diagrams, and deployment diagrams.
  • Introduction to design patterns. Creational patterns.
  • Design patterns continued: Structural patterns.
  • Design patterns continued: Behavioral patterns.
  • Designing with patterns (guest lecture by John Vlissides).

Midterm (may be split up into 2 separate 1-hour tests):

Testing of object-oriented applications.

Advanced topics: multiple inheritance, mixins, reflection.

Advanced topics, to be determined.

Performance analysis of object-oriented applications (guest lecture by Gary Sevitsky).

Analysis of object-oriented programs. Application extraction techniques.

Required texts

The Unified Modeling Language User Guide by Grady Booch, James Rumbaugh, Ivar Jacobson. Hardcover - 482 pages (October 30, 1998). Addison-Wesley Pub Co; ISBN: 0201571684

Design Patterns by Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. Hardcover - 395 pages 1st edition (January 15, 1995). Addison-Wesley Pub Co; ISBN: 0201633612

Refactoring : Improving the Design of Existing Code by Martin Fowler et al. Hardcover - 431 pages 1st edition (August 1999) Addison-Wesley Pub Co; ISBN: 0201485672

Recommended texts

A good textbook on Java is recommended. Two examples of good textbooks are given below:

Java in a Nutshell : A Desktop Quick Reference (3rd Edition) by David Flanagan. Paperback - 648 pages 3rd edition (November 1999). O'Reilly Associates; ISBN: 1565924878

The Java Programming Language by Ken Arnold, James Gosling, David Holmes. Paperback - 704 pages 3rd edition (June 15, 2000) Addison-Wesley Pub Co; ISBN: 0201704331


The course will have a large practical component, in the form of a project in which a simulation of a web-based book-selling system (or something similar) is built. The project consists of several steps:

Creation of an initial design using UML. This initial design will require the use of several design patterns.

Implementation of the design in Java, and testing it (possibly using an automated testing framework such as jUnit).

Refactoring of the system after the requirements have changed (e.g., addition/deletion of features, and requirements make the design more flexible in several respects).

Students will work on projects in groups of 2 or 3 people.


Details to be announced.

G22.3033-009 Empirical Natural Language Processing

Prerequisite: G22.2245-001 (Unix Tools) or equivalent experience

An introductory course in the analysis, design, and implementation of NLP systems, focusing on data-driven techniques. The course will start with a hands-on introduction to working with large text corpora. We will then cover rudimentary machine learning, including basic information theory, and parameter estimation. The rest of the course will explore strategies for building NLP applications, such as:

  • automatic text classification by topic and/or by genre
  • spam filtering
  • gazetteer construction via automatic word categorization
  • context-sensitive spelling correction for OCR
  • language modeling for information retrieval
  • induction of monolingual and/or bilingual dictionaries
  • discourse segmentation and automatic mark-up (text to sgml)
  • automatic hyperlinking
  • bitext detection and language ID on the Web

G22.3033-010 Information Visualization

Prerequisites (required): Substantial background in any one of: Cognitive or perceptual science, computational geometry, computer graphics, graphic or media design, scientific visualization.

Recommended: Substantial background in two or more of the above topics. "Substantial background" is taken to mean a graduate-level course in the subject or substantial undergraduate work (i.e., degree major).

This course will introduce the cross-disciplinary field of information visualization: the process of creating pictures from data as an aid for human comprehension and decision-making. Bar graphs are a simple example of a visualization; this course will move well beyond that into scientific data, multivariate and time-varying information, and complex, abstract data structures. Information visualizations are often not simple, two-dimensional static pictures, so the course will deal with the role of animation and direct manipulation, methods of handling extremely large data sets of arbitrary dimension, and tools for filtering data to provide useful subsets. As the goal of a successful information visualization is to aid human thought, all of these approaches will be presented in the context of an understanding of human perceptual and cognitive processes.

Course work will include: readings from current scientific literature (journal papers, conference proceedings); written analyses; final project of either an implementation of an existing technique in a practical setting or development of an effective new technique.

Students will undertake a course project to build a non-trivial NLP application

G22.3033-012 Molecular Modeling

Prerequisites: basic knowledge of calculus and programming required; some biology/chemistry recommended.

Content: Introduction to biomolecular modeling and simulation, including:

  • Protein and Nucleic Acid Structure and Dynamics - minitutorials;
  • Modeling Approaches - quantum and molecular mechanics, molecular dynamics, Monte Carlo;
  • Force Fields - functional construction, variability, evaluation tricks of the trade;
  • Molecular Visualization & Simulation - introduction to the INSIGHT package;
  • Selected Topics - protein folding, RNA folding, DNA dynamics, structural and functional genomics.

Intended Audience: Advanced undergraduates and graduate students, from all Washington Square science and math departments (chemistry, biology, physics, mathematics, computer science and neuroscience), as well as graduate students from the Sackler Institute of Graduate Biomedical Sciences

Textbook: "Molecular Modeling: An Interdisciplinary Guide" by Tamar Schlick (Springer-Verlag, to appear in 2002). Text will be supplemented by articles and additional reference books.

Format: Class lectures (instructor and guests), student presentations, videos, and computer labs (homework)

G22.3033-013 Data Mining

We live in the Age of Information. The importance of collecting data that reflects a business or scientific activity to achieve competitive advantage is widely recognized now. Advanced systems for collecting data and managing it in large databases are in place in most large and mid-range companies. However, the bottleneck of turning this data into your success is the difficulty of extracting knowledge about the system from the collected data.

What goods should be promoted to this customer?
What is the probability that a certain customer will respond to a planned promotion?
Can one predict the most profitable securities to buy/sell during the next trading session?
Will this customer default on a loan or pay back on schedule?
What medical diagnosis should be assigned to this patient?
How large are the peak loads of a telephone or energy network going to be?
Why does the manufacturing facility suddenly start to produce defective goods?

These are all the questions that can be answered if information hidden in a database can be found explicitly and utilized. Modeling the investigated system and discovering relations that connect variables are the subject of data mining.

The course will introduce concepts and techniques of data mining and data warehousing, including concept, principle, architecture, design, implementation, application of data warehousing and data mining.

Data warehousing and OLAP technology for data mining
Data preprocessing
Descriptive data mining: characterization and comparison
Association analysis
Classification and prediction
Cluster analysis
Mining complex types of data
Applications and trends in data mining

top | contact