Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail linear work per broadcast message. AllConcur -- a leaderless approach -- reduces the work by connecting the servers via a resilient overlay network; yet, this resiliency entails redundancy, which reduces performance. In this work, we propose AllConcur+, an extension of AllConcur. During intervals with no failures, it uses an overlay network with no redundancy and automatically switches to a resilient overlay network when failures occur. Our performance estimation shows that if no failures occur, AllConcur+ achieves up to 10x higher throughput and up to 5x lower latency than AllConcur. In the presence of occasional failures, AllConcur+ still outperforms AllConcur significantly. In the worst case, AllConcur+'s performance is worse than AllConcur's, yet, this requires frequent failures at very specific intervals. Thus, for realistic use cases, leveraging redundancy-free distributed agreement during intervals with no failures, increases the expected performance.
View on arXiv