Most scientific research publications are written poorly, because
talent in science is not necessarily correlated with talent in
writing. Therefore, research publications are not necessarily good
examples of how to write well. So, how does a scientist learn to
write up research?
Before you learn how to write about research, you must learn how to
write in general. The quickest way to do that is to read Strunk &
White's Elements of Style. This book has been around for
decades. It contains very specific and very good advice on how to
write almost anything. Amazingly, it is small enough to fit in your
The book is divided into convenient little sections. Each section
gives advice on one aspect of writing. Instead of reading the book
from start to finish, I recommend that you read one random section of
the book every week. Repeat this random sampling (with replacement)
until you know the book almost by heart. To encourage such use, I
have left a copy of the book in our lab. Let me know if you can't
When you think you know how to write in general, here is some
additional specific advice on how to write research papers in natural
language processing and related fields. If there are points that you
don't understand, I would be happy to explain them.
The lingua franca of computer science publications is
(still and for the foreseeable future) LaTeX. Learn it early; learn
it often. There are some good tutorials; just ask around. If you
write things up in some other format, be prepared to re-write it in
LaTeX when it's time to publish. You will usually rewrite things
several times before they're published in any case, but it's much less
work to rewrite something in the same file format than to rewrite it
in a new format.
We have some homegrown tools and templates to streamline the process
of writing about NLP research in LaTeX. They all live in the writing/
CVS repository on s1, under the shared/ subdirectory. When you start
writing a new paper or tech report, do a CVS check-out of
writing/shared/, and create a new subdirectory for your document under
writing/, e.g. writing/MyNewDir. Then cd to MyNewDir and run
../shared/start-writing. This little shell script will do the following:
After giving paper.template.tex a new name, you can start writing in
it. Don't forget to check your revised tex source files into CVS.
This is the easiest way to have your writings always available no
matter where you are, to share it with others in the group, and to
roll back changes if necessary. You can also edit the mt.bib, but
there's no point checking in the symlink for it, since CVS cannot
detect changes in symlinks anyway. In order to take effect, changes
to mt.bib should be checked in from within the writing/shared/
- It will copy the file writing/paper.template.tex to your new
subdirectory. You should rename this file appropriately to
MyNewPaper.tex or what have you. This template is a good starting
point for most NLP papers and tech reports.
- It will sym-link to a bunch of other files in writing/shared/
that pertain to BibTex, the LaTeX module that greatly simplifies the
management of citations and references. The mt.bib file is a large
bibliography of much of the literature that is relevant to our
research. It's in the format that allows the BibTex program to
typeset relevant references in a variety of styles. By sharing this
file we reduce each other's work in entering bibliographic
information. The files bibtex-citation-macros.sty and
bibtex-citation-macros.bst are programs that tell LaTeX how to format
the mt.bib entries for your particular document. The
paper.template.tex illustrates how to use all these files together.
- It will sym-link to the Makefile. With the Makefile in
place, you can just type 'make MyNewPaper.ps' in the relevant
subdirectory to convert MyNewPaper.tex to PostScript, incorporating
the figures and bibliography in a sensible way. There are also
options for output in PDF, html, etc. Type `make help` to see the
options. To save a few more keystrokes, I usually define the
following macros in my shell (BASH syntax would be slightly
Then I can simply type `mps MyNewPaper` to get PostScript.
- alias mps 'make \!*.ps'
- alias mpdf 'make \!*.pdf'