In this thesis we implement several basic parallel processing primitives by using a replace-add operation, which can supersede the standard test and set, and which appears to be a universal primitive for efficiently coordinating large numbers of independently acting sequential processors. The replace-add is essentially an indivisible add-to-memory operation although concurrent replace-adds can all be processed in the same one cycle. In particular, we use the replace-add to develop routines for concurrent access to a queue and show how they can be used to devise many highly parallel algorithms as well as a distributed, concurrent task scheduler. The paracomputer forms our underlying theoretical model of parallel computation although we also consider a realistic architecture approximating this model. We justify our use of the replace-add operation by presenting a hardware implementation that permits multiple replace-adds to be processed nearly as efficiently as loads and stores. Moreover, the crucial special case of concurrent replace-adds updating the same variable is handled particularly well: If every PE simultaneously addresses a replace-add at the same variable, all these requests are satisfied in the time required to process just one request.