|
|
|
|
|
by lmm
819 days ago
|
|
Postgres is still single-node-first, and while Citus exists I'm skeptical that it can ever become as easy to administer as a true HA-first datastore. For me the reason to use something like Cassandra or Kafka was never "big data" per se, it was having true master-master fault tolerance out of the box in a way that worked with everything. |
|
That is a BIG lift. Joins don't really practically scale at the Cassandra/Dynamo scale, because basically every row in the result set is subject to CAP uncertainty. "Big Data SQL" like Hive/Impala/Snowflake/Presto etc are more like approximations at true scale.
Relational DBMS is sort of storage-focused in the design and evolution: you figure out the tables you need to store the data in a sensible way. They you add views and indexes to optimize for view/retrieval.
Dyanmo/Cassandra is different, you start from the views/retrieval. That's why it is bad to start with these models for an application because you have not fully explored all your specific data structuring and access patterns/loads yet.
By the time Postgres hits the single node limits, you should know what your highest volume reads/writes are and how to structure a cassandra/dynamo table to specifically handle those read/writes.