Hacker News new | ask | show | jobs
by suremarc 1422 days ago
My company offers a data feed over WebSocket that could reach 70MB/s if we allowed you to subscribe to the entire feed at once (we don’t). No one ever planned for it to handle that much data, but here we are. Because WebSocket servers do not scale horizontally we had to consume the entire feed on one machine. (Edit: ofc you can scale the # of clients handled, but not the data throughput!)

Initially our Go app could only manage reading 10MB/s from Kafka, which is pretty sad. After switching from Shopify’s “Sarama” Go library to librdkafka (a Kafka client written in C), as well as some other arcane optimizations, we were able to 20x our throughput and handle up to 200MB/s, despite the overhead of interop between C and Go. On top of this, we used a library that wrapped librdkafka in Go bindings (confluent-kafka-go) and it turned out to have a much simpler interface than Sarama, in addition to it outperforming Sarama 20x.

While we didn’t end up rewriting the thing in Rust, we did end up leveraging a library written in C for major performance wins, in a case where scaling out was not possible. And let us also remember that even when scaling horizontally is possible, it is not necessarily even desirable in all circumstances: distributed systems, aside from introducing some amount of overhead, are basically Pandora’s box wrt. failure modes.

2 comments

Go is nice but it certainly has its limitations. I've seen heavily optimized jvm based things run circles around the kind of numbers you are talking about. 70 MB/s is not a number that should scare anyone running things like Kafka at scale.

I've worked with Elasticsearch clusters indexing close to a million documents per second from kafka. That takes a lot of CPU, memory, bare metal, etc. and it taxes the JVM garbage collection quite a bit. But it can be done. We basically maxed out CPUs across 30 or so nodes. Basically, when you get close to maxing out the disk IO, you know you are doing a decent job.

People reach for C for a lot of reasons. And not always the right reasons. But if it works, it works. A big reason seems to simply be that interfacing with it from various other languages is pretty straightforward. That's why python developers don't worry about performance. They can always plug in something native if it becomes an issue.

Again, scaling out to 30 nodes was not an option for us. It had to run on a single machine, unlike the rest of our data pipeline. While 200MB/s is nowhere near the theoretical limit of our hardware, it is near what Kafka’s stress testing tools achieved in the same setup, particularly with our record size (100 bytes).
I am quite sure those arcane C optimizations are also possible in Go, it is a matter to learn how to properly use value types and the unsafe package.
I meant that we did optimizations in Go, on top of using a Kafka client written in C. When language constructs like channels degrade performance, and when goroutine scheduling takes 20% of your cpu time, it begs the question of whether you’re using the right tool for the job. If you are using unsafe to thwart the garbage collector, this is probably even more true.
Same reasoning applies to C when using inline Assembler then.