68

Apache Spark Streaming, Kafka and HarmonicIO: A Performance and Architecture Comparison for Enterprise and Scientific Computing

BenchCouncil International Symposium (ISB), 2018
Abstract

This paper presents a benchmark of stream processing throughput comparing Apache Spark Streaming (under file-, socket- and Kafka-based stream integration), with a prototype P2P stream processing framework, HarmonicIO. Maximum throughput for broad range of stream processing loads are measured, in particular, those with large message sizes (up to 10MB), and heavy CPU load -- loads more typical of scientific computing use cases (such as microscopy), than enterprise contexts. A detailed exploration of the performance characteristics of these integrations under varying loads reveals a complex interplay of performance trade-offs, uncovering the boundaries of good performance for each framework and integration. Based on these results, we suggest which frameworks and integrations are likely to offer good performance for a given load. Broadly, the advantages of Spark's rich feature set comes at a cost of sensitivity to message size in particular, whereas the simplicity of HarmonicIO offers more robust performance, especially for raw CPU utilization.

View on arXiv
Comments on this paper