Hacker News new | ask | show | jobs
by acconsta 3904 days ago
Yeah, but...

Need replication? Gotta write your own sharding logic or set up pg_shard.

Need aggregations? Gotta write your own logic. Will you do them on the fly? Use triggers? On demand?

Need to remove old data? Gotta set up a cron job. But wait, what if I want to age different series at different rates? Now you need a policy system. Sigh.

Not saying Postgres can't be the storage engine, but there's a lot of work to do on top of that.

1 comments

I believe that what you're referring to as "a lot work to do on top of that" is the correct solution, the ultimate OSS project.

The fallacy of the many time series db's out there is that they discarded the relational database as a viable storage option prematurely and are now trapped solving the very hard problem of horizontally scalable distibuted storage that takes many years to solve instead of focusing on the time-series aspect of it.

Sooner or later something along the lines of pg_shard will become standard in PostgreSQL and other databases, thus you don't really need to write your own sharding logic, you just have to wait. OR you can write it if you want. You have options.

Aggregations is what GROUP BY is for. Removing old data is a non-issue if you're using a round-robin approach (see my blog link), it also makes aging different series at different rates easy.

It's good to have competition in this area. Influx is giving me whiplash (it's on its third database engine in six months!)

The circular buffer approach is fine, but it does have the drawback of being unable to represent variable-length data (like key-value pairs). It's also harder to compress the data.

Can a GROUP-BY do windowed aggregations? Like, take an average over ten-minute windows? My SQL knowledge is not great.