Hacker News new | ask | show | jobs
by jmort 2794 days ago
Can you say a bit more about "performant" or point me to some information? I haven't found any yet. I'm processing millions of protobufs per second and would love to get away from batch jobs to do some incredibly basic counting -- this seems like a fit conceptually...If its a fit, any recommendations on the best way to get those protobufs off a kafka stream and into pipelinedb would be great, too!
1 comments

Performance depends heavily on the complexity of your continuous queries, which is why we don't really publish benchmarks. PipelineDB is different from more traditional systems in that not all writes are all created equal, given that continuous queries are applied to them as they're received. This makes generic benchmarking less useful, so we always encourage users to roughly benchmark their workloads to really understand performance.

That being said, millions of events per second should absolutely be doable, especially if your continuous queries are relatively straightforward as you've suggested. If the output of your continuous queries fits in memory, then it's extremely likely you'd be able to achieve the throughput you need relatively easily.

Many of our users use our Kafka connector [0] to consume messages into PipelineDB, although given that you're using protobufs I'm guessing your messages require a bit more processing/unpacking to get them into a format that can be written to PipelineDB (basically something you can INSERT or COPY into a stream). In that case what most users do is write a consumer that simply transforms messages into INSERT or COPY statements. These writes can be parallelized heavily and are primarily limited by CPU capacity.

Please feel free to reach out to me (I'm Derek) if you'd like to discuss your workload and use case further, or set up a proof-of-concept--we're always happy to help!

[0] https://github.com/pipelinedb/pipeline_kafka