Hacker News new | ask | show | jobs
by simonw 3863 days ago
The worst thing about Kafka in my experience has been the consumer libraries for languages like Python. That's not to say that they are terrible or unusable, just that they don't have nearly as much polish as the core of Kafka itself. I'm very much looking forward to new client libraries built against the new consumer API.
4 comments

https://github.com/parsely/pykafka

PyKafka is currently used in production at Parse.ly, and I've gotten feedback from a lot of other folks who are using it in production as well. The big benefit over kafka-python is that PyKafka supports multi-consumer groups that balance consumption via ZooKeeper with its BalancedConsumer interface. See this thread ( https://github.com/Parsely/pykafka/issues/334 ) for more detail on the differences between the two libraries.

The PyKafka project is prioritizing support for Kafka 0.9 in the next few weeks/months. This includes ensuring that the existing consumers work against the updates to the 0.8.2 consumer API as well as implementing support for the new consumer API introduced in 0.9. Roadmap information can be found here ( https://github.com/Parsely/pykafka/blob/master/doc/roadmap.r... ).

I'd say the Python library I used was borderline unusable, we stopped using Kafka (it was just a trial period, wasn't rolled to production yet) because of limits in one of the most popular Python interfaces. The interface worked well enough, the API was good, but they didn't (and the bug tracker seemed to imply they wouldn't) support synchronizing reads across processes for the same group. What's the point in a distributed synchronized log if you can't do synchronized distributed reads of the log?
Sounds like old news, but if this is still an issue, PyKafka does allow balanced reads across a consumer group. https://github.com/parsely/pykafka
Yeah, it's no longer relevant for that project, but I like the ideas behind Kafka and will probably use it again so I'll look at PyKafka before I look at kafka-python in the future.
Same problem for .NET/C#. Nothing established/built enough to feel comfortable using it in production.
While it feels a bit hacky and unclean, you may want to try using IKVM (http://www.ikvm.net/) to translate and import the Java client in to your .NET project.

Given the difficulty in building a client period (distributed systems, race conditions, etc), being able to rely on the widely adopted & supported official client is quite attractive.

In my test cases the performance is on par running natively on the JVM, except when compression is enabled.

Another option is using the REST proxy and accepting the trade-offs that imposes.

Same here with node.js.

All options are too painful, either use the buggy packages available OR mix the stack with java just for the kafka bit. :(

Hence why I use Groovy for any Kafka endeavors.
The same problem exists with Python, C#, node.js, and Groovy.