Hacker News new | ask | show | jobs
by koolba 3432 days ago
> It's a good thing they didn't use Confleunt Camus. -shudder- It supports Avro-Over-Kafka out of the box, on the caveat that every single time it reads a message off kafka, it pings the schema registry to get the schema for it. That's great and all, until you've got thousands of messages per second.

Why would they need to hit the registry for every message? Wouldn't the schemas be immutable and thus able to be (at least temporarily) cached? They might have millions of messages but it's doubtful they have millions of message schemas.

1 comments

The schemas are not immutable. You also don't hit the schema registry for every message either, in fact you can skip the registry all together and provide the schema manually if you would like.
You could provide them manually, but then any schema upgrade becomes a big pain. Wikimedia, as one example, uses versioned schemas. As such each version is immutable and can be pulled from the cache. Each kafka message has a null byte, and then a long version number prefixed to indicate how it should be decoded.

https://github.com/wikimedia/analytics-refinery-source/blob/...