In this thesis, we propose and analyze a multi-server model that captures a performance trade-off between centralized and distributed processing. In our model, a fraction of an available resource is deployed in a centralized manner (e.g., to serve a most-loaded station) while the remaining fraction is allocated to local servers that can only serve requests addressed specifically to their respective stations. Using a fluid model approach, we demonstrate a surprising phase transition in the steady-state delay, as changes: in the limit of a large number of stations, and when any amount of centralization is available (), the average queue length in steady state scales as when the traffic intensity goes to 1. This is exponentially smaller than the usual M/M/1-queue delay scaling of , obtained when all resources are fully allocated to local stations (). This indicates a strong qualitative impact of even a small degree of centralization. We prove convergence to a fluid limit, and characterize both the transient and steady-state behavior of the finite system, in the limit as the number of stations goes to infinity. We show that the sequence of queue-length processes converges to a unique fluid trajectory (over any finite time interval, as approaches infinity, and that this fluid trajectory converges to a unique invariant state , for which a simple closed-form expression is obtained. We also show that the steady-state distribution of the -server system concentrates on as goes to infinity.
View on arXiv