| Disclaimer: I work for Confluent (www.confluent.io). That said, let me clarify that Kafka Streams is a component of the Apache Kafka open source project. There's nothing proprietary about it. The reason the links below point to docs.confluent.io is because the Apache Kafka docs for Kafka Streams are not ready yet; but they will when Kafka 0.10 is released, which will be the first official Kafka release that includes Kafka Streams. > 1. How does it scale horizontally? See http://docs.confluent.io/2.1.0-alpha1/streams/architecture.h.... > Do I have to deploy this on library on multiple nodes? Kafka Streams really is a normal Java library. You include it in "your" Java application just like you'd include a library such as Apache Commons, Google Guava, or JUnit. There's no "deploying" of this library -- i.e. no need to pre-ship it to machines; also, you don't install a cluster or anything like that for Kafka Streams. It's a purely client-side library. Interestingly, many folks I have talked to seem to be confused initially about the "library" aspect because I suppose, in the big data world, one has simply become used to (and pigeonholed by?) frameworks like Spark, Storm, etc. that you must deploy and operate separately (and which then dictate how they want you to write your own apps against them). Does that make any sense? > 2. Is RocksDB instance local to your consumer/ thread/ node? Yes, it is local. But local state stores (RocksDB is but one choice for implementing these) are also replicated for fault tolerance. Details at http://docs.confluent.io/2.1.0-alpha1/streams/architecture.h... and http://docs.confluent.io/2.1.0-alpha1/streams/architecture.h.... Regarding joins: We have documented some information at http://docs.confluent.io/2.1.0-alpha1/streams/developer-guid.... I hope this helps! If these pointers above aren't sufficient, please drop us a note to dev@kafka.apache.org (http://kafka.apache.org/contact.html). We really appreciate any such feedback (code, docs, whatever) to ensure people have an easy and fun time to get started with Kafka as well as its upcoming Kafka Streams library. |