As the Web evolves, the number of network services deployed on the Internet
has been growing at a dramatic pace. Such services usually involve a massive
volume of data stored in physical or virtual back-end databases, and access
the data to dynamically generate responses for client requests. These
characteristics restrict use of traditional mechanisms for improving service
performance and scalability: large volumes prevent replication of the
service data at multiple sites required by content distribution schemes, while dynamic responses do
not support the reuse required by web caching schemes.
However, many deployed data-centric network services share other properties that can help alleviate this situation: (1) service usage patterns exhibit locality of various forms, and (2) services are accessed using standard protocols and publicly known message structures. When properly exploited, these characteristics enable the design of alternative caching infrastructures, which leverage distributed network intermediaries to inspect traffic flowing between clients and services, infer locality information dynamically, and potentially improve service performance by taking actions such as partial service replication, request redirection, or admission control.
This dissertation investigates the nature of locality in service usage patterns for two well-known web services, and reports on the design, implementation, and evaluation of such a network intermediary architecture, named DataSlicer. DataSlicer incorporates four main techniques: (1) Service-neutral request inspection and locality detection on distributed network intermediaries; (2) Construction of oriented overlays for clustering client requests; (3)Integrated load-balancing and service replication mechanisms that improve service performance and scalability by either redistributing the underlying traffic in the network or creating partial service replicas on demand at appropriate network locations; and (4) Robustness mechanisms to maintain system stability in a wide-area network environment.
DataSlicer has been successfully deployed on the PlanetLab network. Extensive experiments using synthetic workloads show that our approach can: (1) create appropriate oriented overlays to cluster client requests according to multiple application metrics; (2) detect locality information across multiple dimensions and granularity levels; (3) leverage the detected locality information to perform appropriate load-balancing and service replication actions with minimal cost; and (4) ensure robust behavior in the face of dynamically changing network conditions.