Hacker News new | ask | show | jobs
by lmm 1801 days ago
If scalability is your concern then you can't use any of the supposedly core features of an RDBMS, since fundamentally there is no way to have a transaction across multiple nodes without solving a much bigger problem.

Validation is vital but the datastore is not the place to do it, because handling invalid data by dropping it on the floor is almost never the right behaviour.

There is no substitute for actually understanding your data model, but once you do 99% of the time you'll find using an RDBMS comes with minimal benefits and significant costs.

3 comments

My experience is the exact opposite: once you understand your data model, 99% of the time you will find that NOT using a RDBMS comes with minimal benefits and significant costs.
It is extremely difficult to design a good RDBMS schema without understanding the data model, and once you do, it is there documented in its entirety with best in class tooling for anyone else to come along, pickup and be up to speed with it, additionally you don't have to forgo document storage, most if not all modern RDBMS suppord json(b) types.
RDBMS tooling is a long way away from best-in-class, and the data model is extremely awkward in a way that actually distorts your modelling (no collection columns, no sum types...) and there's no real support for keeping track of schema evolution. I agree that recording your schema model explicitly and keeping track of it is very important (using something like Avro's schema registry), but RDBMS tools are not actually that great at it and using an RDBMS brings in a lot of other baggage.
The point is that you can go pretty far on a single cluster (GitLab example). That 99% figure is trivially wrong in that case.