Ideally, procedures and data structures on a shared-memory MIMD machine should be serialization-free and concurrently accessible to avoid (potential) performance-limiting bottlenecks. The fetch-and-add coordination primitive, in conjunction with combining interconnection networks, has been proposed as a means for achieving this goal. The first is essentially an indivisible add-to-memory and the second combines simultaneous requests to the same memory location. In this thesis we address serialization-free memory and process management for a shared-memory MIMD machine with fetch-and-add and a combining network. To meet this goal we adopt a self-service paradigm for the operating system that permits each processing element (PE) to service its own requests (thereby avoiding central server bottlenecks). The success of this approach depends upon the use of concurrently accessible data structures to hold data shared among the PEs. We begin by reviewing existing fetch-and-add based queue and multiqueue (a compressed queue) implementations that support concurrent queue insertion and deletion. We then extend these implementations to include a new operations (e.g., the removal of an interior queue item) and new data structure representations (e.g., linked lists). Parallel memory allocation algorithms, many based on the modified queue and multiqueue data structures, are then given. These algorithms include parallel analogs to a number of existing serial algorithms such as Knuth's boundary tag method and the binary buddy system. Next, we define a set of primitives that permit various task activities, such as creation and scheduling, to be done in parallel. Task-switching readers/writers and event primitives are given as well. In the readers/writers implementations, reader activity is fully parallel in the absence of writers. An important feature of both the readers/writers and event implementations is that tasks waiting for a resource can be resumed in parallel by multiple PEs. We then demonstrate how high-level parallel programming constructs (e.g., parallel loops) may be implemented via the task primitives and the queue and multiqueue data structures. Finally, we prove that one of the readers/writers implementations satisfies certain correctness criteria including freedom from deadlock and the mutual exclusion of readers and writers.