Apache Kafka 0.9 is released | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Apache Kafka 0.9 is released (confluent.io)
	163 points by nehanarkhede 3863 days ago

11 comments

felipesabino 3863 days ago

I've been using kafka 0.8.2 for some time now together with Node.js for both consumer and producer.

Although the producer side is quite simple to use and have more than one option available, the consumer side there is only one project that is "maintained" and works [1][2], all other opstions either only have producer available [3] or have not received a commit in years [4].

I am a bit disappointed about how little attention Node.js with kafka had so far as there are a lot of issues on keeping connection alive and rebalancing that made it really hard to trust the system and automate zero downtime deploys.

Although I still hope all these changes in 0.9 new consumer API solve these issues, I am really happy about the decision to be backwards compatible, making the transition/upgrade a much more smooth process

> To ensure a smooth upgrade paths for our users, the 0.8 producer and consumer clients will continue to work on an 0.9 Kafka cluster.

[1] https://cwiki.apache.org/confluence/display/KAFKA/Clients#Cl...

[2] https://github.com/SOHU-Co/kafka-node/

[3] https://github.com/sutoiku/node-kafka

[4] https://github.com/wurstmeister/node-kafka-0.8-plus

nehanarkhede 3863 days ago

Your critique is well received. The Apache Kafka project has support for the Java clients and the non-java clients will be developed and available in a federated manner. At Confluent, we are focused on providing first class non-java clients that are API and functionality compatible with the java clients. Forthcoming releases of the Confluent Platform will include a C/C++, python and node.js client. Stay tuned http://www.confluent.io/developer#download

SEJeff 3863 days ago

For those that don't realize it, Neha and Jay were two of the main developers who wrote Kafka.

Thanks for the heads up!

pimeys 3863 days ago

We've been using Kafka for almost two years already. Still lots of our codebase is Ruby, and Poseidon is not really that great client library. It's slow and it's not threadsafe.

In our Scala-side we're happy with the current offering.

ah- 3863 days ago

The C/C++ client, will that be librdkafka? If I need a high-quality client for another language, would you recommend building that using librdkafka or the rest proxy?

lobster_johnson 3862 days ago

Is anyone working on Go bindings for 0.9?

jfim 3863 days ago

Congrats to the Kafka team!

The biggest changes I can see in this release are SSL support, new consumer API (beta), quotas and Kafka Connect.

nehanarkhede 3863 days ago

Thanks!

simonw 3863 days ago

The worst thing about Kafka in my experience has been the consumer libraries for languages like Python. That's not to say that they are terrible or unusable, just that they don't have nearly as much polish as the core of Kafka itself. I'm very much looking forward to new client libraries built against the new consumer API.

emmett9001 3863 days ago

https://github.com/parsely/pykafka

PyKafka is currently used in production at Parse.ly, and I've gotten feedback from a lot of other folks who are using it in production as well. The big benefit over kafka-python is that PyKafka supports multi-consumer groups that balance consumption via ZooKeeper with its BalancedConsumer interface. See this thread ( https://github.com/Parsely/pykafka/issues/334 ) for more detail on the differences between the two libraries.

The PyKafka project is prioritizing support for Kafka 0.9 in the next few weeks/months. This includes ensuring that the existing consumers work against the updates to the 0.8.2 consumer API as well as implementing support for the new consumer API introduced in 0.9. Roadmap information can be found here ( https://github.com/Parsely/pykafka/blob/master/doc/roadmap.r... ).

czinck 3863 days ago

I'd say the Python library I used was borderline unusable, we stopped using Kafka (it was just a trial period, wasn't rolled to production yet) because of limits in one of the most popular Python interfaces. The interface worked well enough, the API was good, but they didn't (and the bug tracker seemed to imply they wouldn't) support synchronizing reads across processes for the same group. What's the point in a distributed synchronized log if you can't do synchronized distributed reads of the log?

emmett9001 3863 days ago

Sounds like old news, but if this is still an issue, PyKafka does allow balanced reads across a consumer group. https://github.com/parsely/pykafka

czinck 3862 days ago

Yeah, it's no longer relevant for that project, but I like the ideas behind Kafka and will probably use it again so I'll look at PyKafka before I look at kafka-python in the future.

vdnkh 3863 days ago

Same problem for .NET/C#. Nothing established/built enough to feel comfortable using it in production.

kppullin 3862 days ago

While it feels a bit hacky and unclean, you may want to try using IKVM (http://www.ikvm.net/) to translate and import the Java client in to your .NET project.

Given the difficulty in building a client period (distributed systems, race conditions, etc), being able to rely on the widely adopted & supported official client is quite attractive.

In my test cases the performance is on par running natively on the JVM, except when compression is enabled.

Another option is using the REST proxy and accepting the trade-offs that imposes.

felipesabino 3863 days ago

Same here with node.js.

All options are too painful, either use the buggy packages available OR mix the stack with java just for the kafka bit. :(

joshbaptiste 3863 days ago

Hence why I use Groovy for any Kafka endeavors.

vorg 3862 days ago

The same problem exists with Python, C#, node.js, and Groovy.

hoffcoder 3863 days ago

I have been using Kafka 0.8.2 in a production setting for consuming real-time event traffic from our caching layer for six months. The most difficult parts of my experience were the occasional consumer lags that erupted without warning/cause in the high level Java consumer APIs. A lot of experimentation with their configuration proved to be futile and now we have had to create a feedback system that triggers alerts to change group Ids of our high level consumers every time some consumers start lagging.

Otherwise the performance of Kafka has been impressive (giving a throughput of upto 15000 packets/sec to a 8-consumer pool), even though I have not had the chance to compare it with any other such tool/library.

Nevertheless, I think this update is a long awaited one, and Kafka Connect may really be good starting point for building more (and better) endpoints.

ora600 3863 days ago

In this case, you will enjoy the new consumer in 0.9 a lot!

pcsanwald 3862 days ago

Were the diagrams done with software, or hand drawn? If software, I'm curious what package/style you used, the style looks very similar to Martin Kleppmann's presentation at StrangeLoop; I assumed his were hand drawn but I'm realizing now this might be a omni style or something.

optimusclimb 3862 days ago

This always comes up with Martin Kleppman diagrams. See this discussion:

https://news.ycombinator.com/item?id=9613118

The bottom comments seem to agree it was done using Paper.

twic 3861 days ago

From the Kleppmann's mouth:

https://twitter.com/martinkl/status/629169643710775296

I saw him talk at a conference, and that was one of the questions someone asked. He must be so fed up with it by now!

mixmastamyk 3863 days ago

What is the use case for this product? Could you use it as a replicating database across sites?

btown 3863 days ago

Kafka is a replicating database, but not of randomly accessible or queryable data - rather, it's for logs where you can start at any point in time (including realtime), and play the log forwards from there. If you aggregate access logs into it, then use it to feed stream processing frameworks (or feed it into Hadoop for bulk processing), you can use it very effectively for analytics workloads.

Or you can run your entire business around events-as-ground-truth rather than SQL-style-domain-tables-as-ground-truth. Various modules consume from Kafka and write aggregated records back into Kafka - basically a directed graph of processing modules with Kafka to implement every edge - and finally there's a module that translates the final product into what is essentially a live-updating materialized view for your web backend to consume. LinkedIn does exactly this - they open-sourced Kafka and spun out Confluent to help others use this model.

Stream processing is a very powerful way of thinking about data management - with the great side effect that "migrations" of data tables are limited by your imagination, and they never run the risk of data corruption. We use this paradigm, though not Kafka itself (Mongo supports this paradigm and simplifies a lot of things if you aren't yet at LinkedIn scale), in production at http://belstone.com.

Martin Kleppman's talks are a great place to start if you want to learn more: http://www.confluent.io/blog/making-sense-of-stream-processi... is an excellent overview.

hoffcoder 3862 days ago

Kafka can be used if you want to treat data as streams for some processing (think producer-consumer kind of scenarios). You can point to the stream from any point in time to read it 'as and when things happened'. Kafka's own nodes have replication enabled, and the data that it produces can be consumed in a distributed setting as well (meaning multiple consumers acting as a single high level consumer). But it is not a traditional database as MongoDB or MySQL.

mikeatlas 3863 days ago

"What is the use case for a horizontally scalable message broker?"

kasey_junk 3862 days ago

That promises durability, at least once delivery and sequential consistency (an important set of promises that put it largely in a class by itself).

timc3 3862 days ago

Has anyone come across a similar message queue that implements offsets in a comparable way?

erichmond 3863 days ago

I'm always fascinated by the lack of discussion around distributed systems tooling on HN. Anyway! Congrats!

mathnode 3863 days ago

The only other beast of similar nature that appears occasionally I can think of, is Onyx. Which seems pretty cool.

Anyway,Kafka Connect will provide what probably most people are looking for in Samza.

nehanarkhede 3863 days ago

That observation is correct. Currently, people misuse stream processing systems like Storm and Samza for data import/export. This is an overkill. Kafka Connect is focused on providing scalable and operational connectors to various systems using Kafka as the underlying transport mechanism.

zodvik 3863 days ago

<plug>

We have built a fairly robust system along the idea of Kafka Connect and it's open sourced at https://github.com/flipkart/aesop. Currently, it supports MySQL & Hbase as sources and MySQL, Hbase, ES, Kafka as destination.

erichmond 3862 days ago

Haha, funny you mention Onyx! We pipe data from kafka into storm :) Great combo!

jedisct1 3862 days ago

Shameless plug: if you need to send application logs to Kafka, consider Flowgger: https://github.com/jedisct1/flowgger

delive 3856 days ago

Congrats on the release!

@nehanarkhede was there a specific reason the new consumer is written in Java? The previous consumers are all written in Scala.

AYBABTME 3862 days ago

Support for multi-tenancy is pretty awesome, will make it much easier to support Kafka as a shared service within an organization.