In research on empirical NLP, software is a tool for scientific discovery. As our science advances, so must our tools. Since our science tends to advance rapidly, the purpose and functionality of our software change frequently. Therefore, DON'T WRITE LARGE MONOLITHIC PROGRAMS. Their design is guaranteed to become obsolete before you finish your project. Instead, write small programs whose call structure is a deep tree (or lattice). This kind of design is much easier to modify rapidly.

Everyone who intends to work on a non-trivial software system (which would include at least all of my PhD students) should study the book _Design Patterns_ by Gamma et al. (1994), and keep it handy for reference. This book contains some of the best wisdom of the software development community. It is not just the latest trend. It has withstood the test of time, and is considered basic knowledge by all seasoned software developers. Understanding the contents of this book can increase your programming productivity tenfold.


Your code should contain enough comments to be completely understandable by somebody who knows what it's supposed to do, but has no idea how it works inside. At the very least, every class and method should have some prose description attached. As a rule of thumb, at least every 5 lines of code should have some explanation attached. (Some people say there should be more prose than code in every program, but I'm not that extreme.... yet.) It is much easier to comment *while* you're coding, rather than adding comments afterwards, so get in the habit.

For maximum usefulness, your documentation should be compatible with automatic documentation generators. For C++, use the doxygen conventions. It's easy. For one line comments, precede the comment with "//!" instead of just "//". E.g.:
//! This is a doxygen-compatible one-line comment.
For multi-line comments, start the comment with "/*!" instead of just "/*". E.g.:
/*! This is a doxygen-compatible
    multi-line comment.
You can also get much more sophisticated with Doxygen, and make your documentation much more useful to yourself and others. See the online Doxygen manual at for details.


Here are some guidelines that are specific to C++:

Using CVS, Subversion, or other version control software