Hacker News new | ask | show | jobs
by jdm2212 2488 days ago
The only two of those I know are Kafka and Flink. For those two: Flink is much more full-featured and performant (basically the full Google DataFlow API, and several orders of magnitude faster than Kafka Streaming), but Kafka Streaming has a stupid simple API that is useful if you need streaming because $reason but don't care about scaling up to infinity. If you're doing some really hacky demoware, Kafka Streaming will probably be faster to spin up because you just need the Kafka Streaming jar and a Kafka cluster.
1 comments

Do you have any numbers to back up Flink is faster than KStreams, also under what scenario?

I am genuinely interested as use KStreams a lot but the engineering discipline in the API leads a lot to be desired and more than happy to switch the API if Flink is that much better.

Here's a benchmark of KStreams and Flink [1]. Note that the Flink vs Spark comparison is disputed [2], but both Flink and Spark are several orders of magnitude faster than KStreams. This is inevitable given KStreams architecture -- it stores all its state in Kafka rather than in a data store and with data structures optimized for the use case and doesn't do much coordination among workers. KStreams is there if you want streaming semantics on top of a small-ish Kafka topic you own, but don't care too much about perf. Deploying and maintaining Flink is a much bigger hassle than KStreams -- you need DevOps support to get Flink running, whereas KStreams runs (albeit quite slowly) inside your application with no new state store needed.

Confluent has a good discussion of the ownership issue (DevOps for Flink, devs for KStreams) here [3] though they seriously downplay the huge gap in perf.

[1] https://databricks.com/blog/2017/10/11/benchmarking-structur...

[2] https://www.ververica.com/blog/curious-case-broken-benchmark...

[3] https://www.confluent.io/blog/apache-flink-apache-kafka-stre...

mmh. i found this more recent benchmark. where flink was still faster but kstreams' perfromance much more closer then in the 2017 benchmarks.

I guess kstreams improved performance over time ? Or is the benchmark design just different ?