Hacker News new | ask | show | jobs
by loremipsium 1807 days ago
hadoop, kafka, storm, spark, flink, samza, confluent This tastes an aweful lot like javascript framework hell
2 comments

- Hadoop is an ecosystem.

- Kafka is a distributed log.

- Storm, Samza & Flink are stream processing engines.

- Spark is a Map/Reduce framework that uses memory to cache computations to provide some performance increase over other disk-based frameworks. It can also do some streaming computations if you squint hard enough.

- Confluent is a company that sells an enterprise Kafka.

Not really sure the comparison you made is apt.

Kafka's homepage[1] advertises stream processing as a feature

> Built-in Stream Processing > Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing.

[1]:https://kafka.apache.org/

"Built-in" is a odd word choice there. Kafka Streams is a framework for building stream-processing applications on top of Kafka topics.
Kafka Streams is a streaming framework that uses Kafka's already existing features to implement itself - so resilience and parallelisation is implemented using consumer groups, exactly once using Kafka's transactions and idempotence, topics are used (as well as RocksDB) to store state for stateful aggregations etc. etc.

So unlike Flink, Storm, Spark, Heron etc. it's only useful with Kafka.

"kafka streams" is an add on product to kafka
It appears gearpump was retired in 2018
and beam: beam.apache.org