| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vidarh 4866 days ago

It is not horrible advice once your system hits hard limits of the database system. Depending on your database system you can hit those fairly quickly.

It is often far cheaper to scale an application over many boxes by extracting data from your canonical database into a set of in-memory read-only search structures, for example, and delta-index and merge changes regularly.

It is similarly often far cheaper to sort and group large dataset outside the database because sorting and grouping are simple to parallelise over multiple machines working on in memory subsets and doing cheap merges at the end.

If your system can run at reasonable speed in your RDBMS, sure, do that rather than reinvent the wheel.

But when you find yourself maintaining complex trees of replicas, it is often worth testing if you can do better with specialised middleware that can selectively throw out guarantees your RDBMS can't because it would violate guarantees it is meant to provide and that can otherwise make use of specialized characteristics of your data.

E.g. you don't see people running large search engines out of RDBMS's. For a simple reason: while many RDBMS's provide full text search, you can do it far faster when you realize that your full text index is "always" going to be catching up, and so once you exceed the threshold where a single RDBMS doesn't serve your needs anymore (and often before that) you can save massive amount of resources by building small, frequent deltas of changes, distributing them to however many app servers you need, and gradually merging the deltas into larger chunks to keep the number manageable.

There are a lot of scenarios like that where moving the logic out of your data store makes sense.