506

Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks

Proceedings of the VLDB Endowment (PVLDB), 2022
Abstract

Consensus protocols are widely used to build fault-tolerant applications. At one end of the spectrum of such protocols, Multi-Paxos and Raft have seen widespread adoption, but sacrifice significant latency and throughput. At the other end, protocols like Speculative Paxos and NOPaxos provide high-performance, but require specialized hardware and in-network functionality. This makes it hard to deploy them in environments such as the public cloud, where cloud tenants cannot access the physical network. Our work aims to bridge this gap between deployability and performance. We present Nezha, a high-performance and deployable consensus protocol that exploits accurate software clock synchronization. Nezha does not require special hardware or physical network access, making it easily deployable in virtualized environments. Instead, it uses a new primitive called deadline-ordered multicast (DOM) which orders client-to-replica multicast requests by deadlines specified in synchronized wall-clock time. We compare Nezha to 6 baselines in the public cloud: Multi-Paxos, FastPaxos, NOPaxos and Raft, as well as two recent protocols with clock synchronization, namely, Domino and TOQ-based EPaxos}. Evaluations show that Nezha outperforms the baselines by a median of 7.1x (range: 1.9--20.9x) in throughput, and by a median of 2.3x (range: 1.3--6.5x) in latency. We also use Nezha to replicate two applications (Redis and a prototype financial exchange) and show that Nezha can provide fault tolerance with only a modest performance degradation: compared with the unreplicated system, Nezha sacrifices 5.9% throughput for Redis; it saturates the processing capacity of CloudEx and prolongs the order processing latency by 4.7%. Nezha is open-sourced at https://gitlab.com/steamgjk/nezhav2.

View on arXiv
Comments on this paper