Database Systems

G22.2433.01
Spring 2006
Monday 5:00 - 7, Warren Weaver 109.

Prof Dennis Shasha

Instructors: Dennis Shasha (shasha@cs.nyu.edu) with Alberto Lerner (lerner@cs.nyu.edu)

Office hours: Monday 4-4:45, Warren Weaver room 422 also after class.

Teaching Assistants: Guy Lichtman (lichtman@cs.nyu.edu) -- he will be familiar with the course material so you can ask him questions. He is the primay source of knowledge about software we will use. He is also the class list administrator. His software web page is here


OVERVIEW

The goal of this course is to study the design and data manipulation of large data applications. The course will be addressed to students who have only an elementary familiarity with databases, but want to learn how to use such systems effectively and correctly.

Virtually every interesting computer application involves large data. Database management systems have grown up as generic pieces of software that allow the input, modification, and querying of large data. The chief amenities offered by such systems are a high level query language (SQL, XPath, XQuery), indexes to make data access fast, an optimizer that chooses how to process queries, and the illusion of transactional atomicity (concurrency control and recovery). The job of a database designer is to understand an application, decide on the table design (which attributes go into which tables), decide on a set of indexes, and build queries. Doing this well entails learning design theory (normal forms), the various kinds of indexes, and SQL. We will spend the bulk of this course on these issues though we will touch on data mining, data warehouses, capacity planning, transaction processing, semantic extensions to SQL for finance, and XML.

Syllabus

The syllabus follows the chapter in the book Database Systems Concepts, fifth edition. You can use any recent edition however.

Homeworks

You may work in groups of two on the homeworks. Please sign both of your names.

Grading

Homeworks 1/2 and in-class exam 1/2 (closed book but for which you may bring in one 8.5 inch by 11 inch sheet of paper on which you may write or type anything you like on either side of the sheet). The in-class exam will be given on April 24, 2006. For the last class on May 1, you will be invited to put your feet up on the table and relax. Note that you can work in teams of 1 or 2 for all homeworks.

Exam topics follow what we covered in the lectures. Nothing optional will be included (though I encourage you to read that). Typical questions: given an English description, find the SQL; canonical covers and database table design; index selection; capacity planning; application design (given an English description, find the table design and select appropriate indexes); XML; query optimization basics; etc.

Late policy: Homeworks must be submitted electronically at 4:30 PM on the day they are due to the teaching assistant or in class if by hard copy. There are no late homeworks. If you miss a homework, then I need a doctor's note or a note from your employer (with contact information). Your scores from the other homeworks will be averaged in to give you a grade for the missing one. If you miss the final exam, then again I need a note from your employer or doctor (again with contact information). Please note that I may verify these notes. I'm sorry to be hard-nosed about this, but I'm giving you the homeworks as of the first day of class so you should make these deadlines.

Notes and Other Information

  • Register for the class list server mailing system here.
  • The notes come largely from the book Database System Concepts by Silberschatz, Korth, and Sudarshan fourth or fifth edition. Formally, this book is optional, so you can get it from an online bookseller as you wish. There are also six copies on reserve in Bobst (not in the Courant library but in Bobst).
  • Here is a lecture on biosurveillance by Andrew Moore and colleagues.
  • Here is a lecture on time series analysis by my colleagues and me.
  • We will also discuss, if we have time several related topics found at the end of these lecture notes (i) high performance replicated database systems without the use of two phase commit, (ii) case studies from Wall Street, (iii) a self-tuning technique that improves on LRU, and (iv) a data structure for data warehouses.

  • Here are some notes about good and bad uses of stored procedures

    Here is Joe Conron's nice paper on indexes (from when he was a master's student).

    Here are notes about materialized views in Oracle.

    Finally, here are the rules about academic honesty.