Hacker News new | ask | show | jobs
by AndrewKemendo 2053 days ago
I am bullish on better time-series systems, but why would I use InfluxDB IOx over Kafka?

I can choose whatever memory I have available for each topic/partition to do a query - or KSQL to transform topics.

edit: I should say that I can recognize a few benefits from columnar memory and lower level management with Rust, but it would be a huge infrastructure/tooling shift for a regular CRUD API shop or one that has already invested in any eventing system IMO.

2 comments

We are talking 10x+ less storage/servers when working on columnar storage.
Time series databases and Kafka are intended for very different workloads.

Kafka is used as distributed message queue. I.e. a set of apps produce opaque messages and send them to Kafka queues, while another set of apps consume these messages from Kafka queues.

Time series databases are used for storing and querying time series data. There are no Kafka-like queues in time series databases. Every time series has a key, which uniquely identifies this time series, and a set of (timestamp, value) tuples usually ordered by timestamp. Time series key is usually composed of a name plus a set of ("key"=>"value") labels. These labels are used for filtering and grouping of time series data during queries. There are various time series databases optimized for various workloads. For example, VictoriaMetrics [1] is optimized for monitoring and observability of IT infrastructure, app-level metrics, industrial telemetry and other IoT cases.

As you can see, there is little sense in comparing Kafka with time series databases :)

[1] https://victoriametrics.com/

Last time I checked out VictoriaMetrics it had something akin to a memory leak when writing evenly to large numbers of distinct keys -- attempting to cache all the data in RAM and not freeing it even under high memory pressure (it's been awhile, but iirc it had a hard-coded 1hr expiry). Does that sound like a behavior that would still exist in the current design?
You seem to be making an assumption that I'm not going to (or shouldn't) use Kafka as a database for my time series data - when in fact that is how we use it.

I store and query messages in certain Kafka Topics - it's not all our data, but it is a non-trivial amount of it.