Though advances in VLSI technology will soon make it practical to construct parallel processors consisting of thousands of processing elements (PEs) sharing a central memory, the performance of these parallel processors is limited by the high memory access time due to interconnect network latency. This thesis is a study of how the performance of a parallel processor is affected by associating a cache memory with each PE of the system. Cache parameters and policies are varied and the performance of the resulting cache configurations are compared. The cache coherence problem is discussed and a solution that is compatible with the philosophy of parallel systems is adopted. Performance is analyzed by analytic and simulation models. Due to time and space limitations the simulation modeling is done in a hierarchical fashion: a primary level simulates a single cache and a secondary level simulates a parallel machine. The simulators can run in a trace-driven and self-driven mode. The trace data used to drive the simulators was collected by tracing the reference patterns of actual parallel programs. An approximate analytic model is developed that predicts the queue waiting times of various components of a parallel system, enabling the comparison of a water range of cache parameters than is possible with the simulators.