How to write up research

Most scientific research publications are written poorly, because talent in science is not necessarily correlated with talent in writing. Therefore, research publications are not necessarily good examples of how to write well. So, how does a scientist learn to write up research?

Before you learn how to write about research, you must learn how to write in general. The quickest way to do that is to read Strunk & White's Elements of Style. This book has been around for decades. It contains very specific and very good advice on how to write almost anything. Amazingly, it is small enough to fit in your pocket.

The book is divided into convenient little sections. Each section gives advice on one aspect of writing. Instead of reading the book from start to finish, I recommend that you read one random section of the book every week. Repeat this random sampling (with replacement) until you know the book almost by heart. To encourage such use, I have left a copy of the book in our lab. Let me know if you can't find it.

When you think you know how to write in general, here is some additional specific advice on how to write research papers in natural language processing and related fields. If there are points that you don't understand, I would be happy to explain them.

[Tools | Content | Format | Extras ]
Tools The lingua franca of computer science publications is (still and for the foreseeable future) LaTeX. Learn it early; learn it often. There are some good tutorials; just ask around. If you write things up in some other format, be prepared to re-write it in LaTeX when it's time to publish. You will usually rewrite things several times before they're published in any case, but it's much less work to rewrite something in the same file format than to rewrite it in a new format.

We have some homegrown tools and templates to streamline the process of writing about NLP research in LaTeX. They all live in the writing/ CVS repository on s1, under the shared/ subdirectory. When you start writing a new paper or tech report, do a CVS check-out of writing/shared/, and create a new subdirectory for your document under writing/, e.g. writing/MyNewDir. Then cd to MyNewDir and run ../shared/start-writing. This little shell script will do the following:

  1. It will copy the file writing/paper.template.tex to your new subdirectory. You should rename this file appropriately to MyNewPaper.tex or what have you. This template is a good starting point for most NLP papers and tech reports.
  2. It will sym-link to a bunch of other files in writing/shared/ that pertain to BibTex, the LaTeX module that greatly simplifies the management of citations and references. The mt.bib file is a large bibliography of much of the literature that is relevant to our research. It's in the format that allows the BibTex program to typeset relevant references in a variety of styles. By sharing this file we reduce each other's work in entering bibliographic information. The files bibtex-citation-macros.sty and bibtex-citation-macros.bst are programs that tell LaTeX how to format the mt.bib entries for your particular document. The paper.template.tex illustrates how to use all these files together.
  3. It will sym-link to the Makefile. With the Makefile in place, you can just type 'make MyNewPaper.ps' in the relevant subdirectory to convert MyNewPaper.tex to PostScript, incorporating the figures and bibliography in a sensible way. There are also options for output in PDF, html, etc. Type `make help` to see the options. To save a few more keystrokes, I usually define the following macros in my shell (BASH syntax would be slightly different):
    • alias mps 'make \!*.ps'
    • alias mpdf 'make \!*.pdf'
    Then I can simply type `mps MyNewPaper` to get PostScript.
After giving paper.template.tex a new name, you can start writing in it. Don't forget to check your revised tex source files into CVS. This is the easiest way to have your writings always available no matter where you are, to share it with others in the group, and to roll back changes if necessary. You can also edit the mt.bib, but there's no point checking in the symlink for it, since CVS cannot detect changes in symlinks anyway. In order to take effect, changes to mt.bib should be checked in from within the writing/shared/ subdirectory.

Content

General

Grammar

Word Choice

Citations

Format

More advice on technical writing


Dan Melamed (melamed at cs.nyu.edu)
Last modified: Wed May 24 17:32:44 EDT 2006