Hacker News new | ask | show | jobs
by ako 3175 days ago
How do you see this compared to Prestodb, which also provides SQL for Kafka topics? Being able to join Kafka data with other tables seems like a useful benefit of Prestodb?
2 comments

It would reduce the latency of data but we have not tried in production.

Given that Kafka neither provides secondary indexes nor organizes data in columnar format, Presto essentially have to somewhat scan through the Kafka topics to execute the queries, resulting a lot of disk I/O.

Our Kafka infrastructure handles more than one trillion messages per day and guarantees second-level latency SLAs. Reading aggressively could easily saturate all the I/O bandwidths of the nodes and leads to outages. We actually had several incidents in the past when we did backfills. So I'm more conservative on this.

Have you used Presto for doing sql queries on kafka topics ? Would be interested to see some experience reports on using this in production.

I have used Presto by the Hive connector and the results were pretty nice.