|
|
|
|
|
by WWLink
3432 days ago
|
|
No. I think the ideal use case here is you use JSON over Kafka, and store the data in Avro files. The avro files have the schema at the start. I wonder what they're using to retrieve that data for analysis later on. I do something very similar to this, but having to sift through millions of messages for a given time period, to find a subset of said messages is kinda annoying. It's a good thing they didn't use Confleunt Camus. -shudder- It supports Avro-Over-Kafka out of the box, on the caveat that every single time it reads a message off kafka, it pings the schema registry to get the schema for it. That's great and all, until you've got thousands of messages per second. |
|
It looks like hive or spark, depending on the use case. The data is also loaded into Druid when looking at statistics, rather than getting full data about individual messages.
> It's a good thing they didn't use Confleunt Camus. -shudder- It supports Avro-Over-Kafka out of the box, on the caveat that every single time it reads a message off kafka, it pings the schema registry to get the schema for it. That's great and all, until you've got thousands of messages per second.
They are using camus, much of the post is dedicated to it. It looks like they are also running avro over kafka+camus for some application logging, but at a lower volume (~10k messages/sec peak)