Hacker News new | ask | show | jobs
by arcticfox 1975 days ago
5.7 TB for an OLTP database is small?! I must be living in a different world. Obviously I know you can go that big, but I thought the number of use-cases would be limited.
1 comments

Why does my browser routinely eat 8GB while it used to only require 32MB 25 years ago? because it can. Web services likewise come up with features and data to fill databases.

For $8/hr you can rent a DB with 500 GB of memory and 64 cores, complete with redundancy, automated backups, and failover. For the hourly rate of an oracle consultant you can rent a DB with 2TB of memory for the day.

Bear in mind that many of these workloads are trivially shardable (e.g. any table keyed off customer ID) and can be scaled across hundreds of DBs as required.

If data is sharable, it doesn't mean that it is trivial. In your example with shards by user, simple message sent between users in app becomes are very non trivial dance to be done reliably.
If you recognize that there isn't a good automated solution for the DB to smartly join messages than it becomes a fairly straightforward problem once again.

e.g. the simple solution is to denormalize the table and have each message keyed by recipient. In a dating app you'll roughly double your message count this way, and even in most messenger apps the proportion of messages sent person to person is likely the most significant.

A smarter solution is to key each conversation by a unique key in a sharded table and then store the set of "conversations" that a user is engaged in in the sharded users table. Fetching the messages for a user then becomes a simple 2 query process - fetch/filter the conversations, then fetch the messages. No duplication of messages, and likely just a few extra bytes per message for the key.

It would be great if the DB could manage the above application side sharded join internally, but we're unfortunately a few steps away from that today.

It doesn't matter how you arrange data, the moment you need to commit to 2 shards transactionally you are either having consistency trouble or performance trouble.

Both your schemas require writes to at least 2 shards transactionally.

Then you don't do chats on your primary RDBMS.