Hacker News new | ask | show | jobs
by lmm 819 days ago
Postgres is still single-node-first, and while Citus exists I'm skeptical that it can ever become as easy to administer as a true HA-first datastore. For me the reason to use something like Cassandra or Kafka was never "big data" per se, it was having true master-master fault tolerance out of the box in a way that worked with everything.
5 comments

If you are going to multi-node Postgres, you need to start planning for Cassandra/Dynamo.

That is a BIG lift. Joins don't really practically scale at the Cassandra/Dynamo scale, because basically every row in the result set is subject to CAP uncertainty. "Big Data SQL" like Hive/Impala/Snowflake/Presto etc are more like approximations at true scale.

Relational DBMS is sort of storage-focused in the design and evolution: you figure out the tables you need to store the data in a sensible way. They you add views and indexes to optimize for view/retrieval.

Dyanmo/Cassandra is different, you start from the views/retrieval. That's why it is bad to start with these models for an application because you have not fully explored all your specific data structuring and access patterns/loads yet.

By the time Postgres hits the single node limits, you should know what your highest volume reads/writes are and how to structure a cassandra/dynamo table to specifically handle those read/writes.

> Postgres… Kafka… Cassandra

These are all wildly different products that should not be considered for the same purposes.

But that's the gist of the article here, right? That Postgres is taking over all db-like use cases. It doesn't claim that it can replace Kafka but https://www.amazingcto.com/postgres-for-everything/ certainly does. Of course it's not a full replacement, but it might be good enough.
If your premise is that Postgres is eating the datastore world, then you're talking about using it as a replacement for Kafka and Cassandra.

Frankly if you zoom out far enough they're all systems suitable for use as your primary online datastore that you build your application on (each with their own caveats of course). There are places where they compete.

Patroni has native HA support for citus horizontal cluster since v3. Which means your can create a HA citus cluster as simple as: https://pigsty.io/docs/pgsql/config/#citus-cluster
That looks like apples vs oranges.
You should look at CocroachDB: it tries to be PG compatible (not there yet), but true HA-first.
At often 10x the latency and significant query compatibility issues. That's a hard pill to swallow
latency increase is likely payment for distributed consistency regardless of specific DB: doing consensus between nodes is much slower than dumping block of data on local nvme.