| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AtlasBarfed 866 days ago

If you are going to multi-node Postgres, you need to start planning for Cassandra/Dynamo.

That is a BIG lift. Joins don't really practically scale at the Cassandra/Dynamo scale, because basically every row in the result set is subject to CAP uncertainty. "Big Data SQL" like Hive/Impala/Snowflake/Presto etc are more like approximations at true scale.

Relational DBMS is sort of storage-focused in the design and evolution: you figure out the tables you need to store the data in a sensible way. They you add views and indexes to optimize for view/retrieval.

Dyanmo/Cassandra is different, you start from the views/retrieval. That's why it is bad to start with these models for an application because you have not fully explored all your specific data structuring and access patterns/loads yet.

By the time Postgres hits the single node limits, you should know what your highest volume reads/writes are and how to structure a cassandra/dynamo table to specifically handle those read/writes.