Graduate Special Topics in Computer Science

NOTE: for descriptions of standard graduate computer science courses, see Graduate Course Descriptions.

G22.3033-001 Logic in Computer Science

A beginning graduate-level class in mathematical logic with motivation provided by applications in Computer Science. There are no formal prerequisites, but the pace of the class will require that the students can cope with a significant level of mathematical sophistication. Topics include: propositional and first-order logic; soundness, completeness, and compactness of first-order logic; first-order theories; undecidability and Godel's incompleteness theorem; and an introduction to other logics such as second-order and temporal logic.

G22.3033-002 Machine Learning

Pre-requisites: Linear algebra, vector calculus, elementary statistics and probability theory. Good programming ability is a must: many assignements will involve implementing algorithms studied in class.

This course will cover a wide variety of topics in machine learning, pattern recognition, statistical modeling, and neural computation. The course will cover the mathematical methods and theoretical aspects, but will primarily focus on algorithmic and practical issues.

Machine Learning and Pattern Recognition methods are at the core of many recent advances in "intelligent computing". Current applications include machine perception (vision, audition), control (process control, robotics), data mining, time-series prediction (e.g. in finance), natural language processing, text mining and text classification, bio-informatics and computational models of biological processes, and many other areas.

Topic covered:

  • the basics of inductive inference, learning, and generalization.
  • energy functions and loss functions: maximum likelihood, MAP, and discriminative criteria.
  • linear classifiers: perceptron, LMS, logistic regression.
  • non-linear classifiers with linear parameterization: basis-function methods, boosting, support vector machines.
  • multilayer neural networks, backpropagation, heterogeneous and modular learning systems
  • graph-based models for sequences: hidden Markov models, finite-state transducers, graph-transformer networks.
  • Latent variables and the Expectation-Maximization algorithm.
  • unsupervised learning: density estimation, clustering, and dimensionality reduction methods.
  • introduction to graphical models.
  • approximate inference, sampling.
  • optimization methods in learning: gradient-based methods, second-order methods.
  • the bias-variance dilemma, regularization, model selection, cross-validation, bagging.
  • applications to vision, speech, language, forecasting, and biological modeling.

G22.3033-003 Structures in Natural Language Processing

Prerequisites: A- in Fundamental Algorithms (G22.1170) and Natural Language Processing (G22.2590)

For complete course information see

G22.3033-004 Computational Biology/Bioinformatics

This is a subject at the interface of mathematics, computer science and biology, with the aim of understanding not just the components of life (so-called "part-lists") but also how they interconnect and interact with each other to perform the functions of life. Emphasis is on the statistical and computational approaches to understand the experimental data that can be created by novel high-throughput biotechnology: single-molecules, micro-array, whole-genome scanning of polymorphisms, mass-spectrometry, etc.


1. Introduction to Biology: Genome; Central Dogma; RNA, DNA, Proteins; and Cell

Challenges for computational biology:

* Alignments (sequence alignments, structure alignments)
* Pattern Finding (motifs, gene prediction, cis-regulatory elements)
* Overlap Detection and Contigs
* Structure Prediction (Protein Folding & RNA secondary structure)
* Classification (Protein families, gene families)
* Phylogeny Analysis
* (databases available)

2. & 3. Introduction to Computer Science, Statistics; Algorithms (Varun ) and data structures (trees)

* Dynamic programming
* Complexity: P and NP (eg- multiple sequence alignment)
* Heuristic methods, neural networks, simulated annealing, genetic algorithms
* Statistical distributions, models, estimation
* Bayesian Inference
* EM Algorithms, Metropolis, Gibbs Sampler, MCMC
* Hidden Markov Models
* Supervised learning, SVM

4. & 5. Pattern Discovery & Sequence Alignment (applications)

* Exact Matching, Approximate Matching
* Exact & Approximate Pattern discovery
* Pair-wise alignment (Needlman-Wunsch, Smith-Waterman)
* Multiple sequence alignments (heuristics)

6. Mapping, Sequence Assembly

* Sanger sequencing,
* cloning, clone overlaps
* interval graphs
* contig algorithms
* ocean-islands (Lander Waterman model)

7. Proteomics (Structure Prediction)

* Introduction (primary, secondary structures)
* Ab-initio models for prediction
* Homology based, threading
* Geometric shape matching (docking)

8. & 9. Transcriptomics

* Gene expression
* Microarray data analysis (supervised & unsupervised)
* Motif detection in promoter sequences

10. Metabolic & Regulatory Pathways
* Kinetic mass action models (eg LAC operon)
* Trajectory Analysis
* Model Reconstruction & Validation

11. Evolution

* Models of evolution
* Evolutionary distances
* Phylogeny trees

12. Genetics, SNPs & Association Studies

* Coalescent trees and haplotypes
* SNP phasing (perfect phylogeny & statistical methods)

G22.3033-005 Models & Analysis of Real-Time & Hybrid Systems

The class of computer systems whose modeling, analysis, and development are most challenging is the class of reactive systems. Reactive systems are systems whose role is to maintain an ongoing interaction with their environment rather than produce some final value upon termination. Typical examples of reactive systems are Network protocols, Air traffic control system, Programs controlling mechanical devices such as a train, a plane, or ongoing processes such as a nuclear reactor. Recently, there is an awakened interest in reactive systems due to the emergence of embedded systems as a major research field.

The Formal Methods approach to software analysis and construction is based on viewing a program and its execution as mathematical objects and applying mathematical and logical techniques to specify and analyze the properties and behavior of these objects. The main advantages of the formal approach to software construction is that, whenever applicable, it can lead to an increase of the reliability and correctness of the resulting programs by several orders of magnitude.

Several level of abstraction can be used in the study of reactive and embedded systems. At the highest level of abstraction, we only care about the temporal precedence of events and actions, and can guarantee that a system satisfies a requirement such as "every request is eventually followed by a response". However, at this level of modeling we cannot bound the response time by actual numbers. In the course, we consider two successive refinements of our modeling and specification abilities. The first refinement considers the model of real-time systems . In this model, we represent, specify, and analyze the temporal distance between events, thus being able to require and verify a property such as "every request is eventually followed by a response, which must occur within the time interval [3,5] from the request".

The next level of refinement considers the model of hybrid systems. In this model, we consider a system consisting of a combination of components some of which are discrete, such as programs, while the other are continuous, such as a motor driving a railroad gate barrier. This model employs discrete as well as continuous analysis tools in order to analyze and verify the correctness of hybrid systems, typically corresponding to a digital program controlling a continuous environment.

Course topics:

  • The computational model of Real-Time Systems (RTS).
  • A simple programming language (SPL),extended by timing information and its translation into RTS.
  • The specification language of Timed Temporal Logic (TTL).
  • Discrete and dense-time Timed Automata.
  • Model checking timed properties of timed automata.
  • Deductive verification of timed properties.
  • The computational model of Hybrid Systems.
  • Hybrid automata.
  • Model checking properties of hybrid automata.
  • Verification Diagrams.
  • Deductive verification of Hybrid Systems.

G22.3033-006 Analysis of Reactive Systems

Prerequisites: Some background in algorithm design, familiarity with the language of first-order logic, and parallel programs.

Textbooks: Recommended: Temporal Verification of Reactive Systems: Safety by Zohar Manna and Amir Pnueli, Springer-Verlag

Reactive systems are systems whose role is to maintain an ongoing interaction with their environment rather than produce some final value upon termination. Typical examples of reactive systems are Air traffic control system, Programs controlling mechanical devices such as a train, a plane, or ongoing processes such as a nuclear reactor.

The Formal Methods approach to software construction is based on viewing a program and its execution as mathematical objects and applying mathematical and logical techniques to specify and analyze the properties and behavior of these objects. The main advantages of the formal approach to software construction is that, whenever applicable, it can lead to an increase of the reliability and correctness of the resulting programs by several orders of magnitude.

Several approaches to the verification of reactive systems are available, the most prominent of which are Algorithmic (model checking) and Deductive verification. This semester we concntrate on deductive verification, where we use theorem-proving techniques to establish the correctness of reactive programs relative to their temporal specifications. We will be using computer-aided theorem provers such as PVS and STeP for verifying the properties of programs.

Course topics:

* The computational model of Fair Discrete Systems (FDS).
* A simple programming language (SPL) and its translation into FDS.
* The specification language of linear temporal logic (LTL).
* The PVS Theorem prover. Encoding FDS's and LTL within PVS.
* Verifying invariance properties.
* Algorithmic construction of auxiliary invariants.
* Verifying progress properties.
* The CHAIN and WELL rules.
* Verification Diagrams.
* Verifying progress under compassion.
* Verifying general LTL properties.

G22.3033-008 Producing Production Quality Software

/Producing Production Quality Software provides detailed, practical, training for producing high-quality code.

We cover the following topics:
  • Object Oriented Concepts
  • Evaluating requirements and specifications
  • A Pseudocode/PDL approach to design and implementation, using libraries, data structure selection
  • Practical code correctness: using invariants
  • Code reviews: how to detect software problems by reading code
  • Quality code guidelines: a short list of criteria for good code
  • System architecture: module decomposition and interfaces
  • Object-Oriented design: several approaches
  • Large system design in Java: file and class organization, OO hierarchy, deciding whether to use composition or inheritance, avoiding dependency cycles
  • Other design methodologies: data flow diagrams, structured design, top-down design, bottom-up design, processing semi-structured input with REs, FSMs, and CFGs, design patterns
  • Error handling: exceptions, error types, propagating and handling errors
  • Testing: unit testing, using assertions, coverage testing, independent testing, choosing good test cases, structured basis testing
  • Black box (functional) testing: available tools, its limitations
  • Group programming: group dynamics, team work
  • Concurrency: writing reliable programs with threads
  • Our pedagogy involves reading and writing a lot of code. Students see both good and bad code in class. The professor (or trained TA) reads and reviews all code written by students. Students write both small and large programs.

    G22.3033-009 WWW Programming

    A programming course that focuses on covering many topics that relate to the WWW.

    Topics covered include:

    • Object Oriented Concepts
    • The structure and protocols of the WWW: IP, TCP, UDP, SMTP, DNS, PING, HTTP
    • Programming in Java using J2SE
    • XML, XSL, XSLT, and XPath
    • WebServices
    We will do five or more homework assignments in Java. These will include building applications that demonstrate:
    • A Swing UI
    • Sockets
    • Multithreading and RMI
    • Servlets
    • JSPs
    • WebServices
    The class does NOT require any previous Java programming expertise, but programming in some language is required. You will be writing code and expected to get up to speed quickly on Java and the IDE we will use. You should have a windows 2000/XP or Linux machine with at least 256 megs (512 meg better) and a 750 mHz processor or better.

    G22.3033-010 Application Servers

    An application server is a rich, highly portable software program that runs on a middle tier server and handles all application operations between client applications running on pervasive devices and back-end databases and business applications. Application servers provide a platform independent programming interface for developing portable applications in a variety of programming languages. Application servers also facilitate the integration of legacy applications via on-the-fly transformation of XML messages, and support a wide variety of XML-enabled client channels that include traditional web clients and a growing set of smart devices. Application servers enable programmers to implement distributed applications using a variety of architectural patterns including traditional Model-View-Controller (MVC) architectures, Service Oriented Architectures (SOAs), Point-to-Point (P2P) Architectures, Grid-Computing Architectures, etc. Using an SOA architectural pattern, and emerging standards such as SOAP, application servers enable a new generation of "web services" that allow systems to make remote procedure calls to other systems over the Internet. Today, J2EE, .Net, and CORBA 3 application servers set the stage as enabling platforms for Web Services initiatives, Web appliances, and wireless applications.

    This course concentrates on architecting, designing, and developing persistent software applications using application server technology. Throughout the course, students are exposed to the evolution of application server architectures that started in the mid 1990s, and experiment with corresponding approaches based on traditional client-server technology, CGI frameworks, page-based extended HTML environments, distributed object computing platforms, object management architectures, component-based computing environments, and frameworks based on MVC, SOA, P2P, etc. The course conveys the necessary skills to select the proper application server architecture based on projects' business and technical requirements. The course also explains how to integrate an application server into an existing Web site, as well as how to implement an application server-based Web application from the ground up. Students will learn how to configure and operate application servers in production environments, taking advantage of features available in mainstream commercial frameworks such as scalability, concurrency, security, fault tolerance, auto-deployment, communications support, development environment, and monitoring tools.

    As they design and implement applications using commercial and open source application server technologies (e.g., IBM WebSphere 5.0, BEA WebLogic 8.1, .Net, OpenCCM 0.2, etc.), students will learn how to identify application patterns that lead to the most effective use of the various services provided within application server frameworks. The design and implementation of the persistence and legacy application integration layers, using related application server technology, will be particularly emphasized. Case studies, provided as part of the course, will demonstrate how medium- to large-size sites manage the complexities inherent in these endeavors. These case studies will help students get a firm understanding of how application servers work and how to best deploy complex applications in a real-world situation. Although, the course will strive to provide a comprehensive coverage and classification of application server technologies, attempts will be made whenever possible to select open source technologies for experimentation purpose. As part of the course, students will be exposed to next generation application server technologies including Model Driven Architectures (MDAs), as well as reflective, multimedia and agent-enabled frameworks

    top | contact