Hacker News new | ask | show | jobs
by grammr 2982 days ago
Hi there! I'm one of the PipelineDB founders. This description is correct. The unique thing about PipelineDB is that it doesn't store granular data. Once all aggregates are incrementally updated, the raw input rows as discarded and only aggregate output is stored.

This approach dramatically limits disk IO and long-term storage requirements, and enables super high performance in most cases on modest hardware.

PipelineDB has been used in production for nearly four years now and is used by Fortune 100 companies.

1 comments

So once you make it as an extension, any chance to mix PipelineDB with Citus in one cluster?

My hunch says that it's possible as far as there is some additional computation done with the future aggregate query on the coordinator in Citus.

PPDB looks interesting, but we also need to keep the underlying raw data and multiple clusters require more complex pipeline.

We haven't looked too far into integrations with any existing systems at this point, but if there was significant user demand for it on both ends we'd definitely be open to it.

One thing I will mention here is that we do have plans to add support for persistent streams [0] after version 1.0.0 is released. We've learned a lot over the years about how our users/customers interact with streams in production and persistent streams will be built atop that foundation of understanding.

Please feel free to comment on that issue with your use case, requirements, etc. and we'll see what we can do!

[0] https://github.com/pipelinedb/pipelinedb/issues/1463

Persistent streams are interesting, but we spent years refining our ETL and building it around Citus, that it would be very complicated to separate those two. I will wait for the extension and do some testing.