Hacker News new | ask | show | jobs
by cphoover 2878 days ago
Scale is not just user load, but also scale of application complexity. In my experience when one db connection has access to every resource, in a complex application, this can lead to some really convoluted queries and make schema changes very difficult because of cross cutting dependencies built into these queries, triggers, procedures... etc. This is forgetting about the issues of deadlocks when you have 80 consuming services and applications you don't even know about are opening up all sorts of transactions. Even just splitting the DB into schemas for each resource domain and limiting access per service can help to avoid this.

Also performance is relative, I've worked on highly trafficked applications that had to support high throughput. I have also worked on applications backed by relational storage where data size and complexity has impacted performance.

3 comments

> "Scale is not just user load, but also scale of application complexity"

In my experience, when people use NoSQL because "the application is too complex for relational DBs" they tend to make a mess of it, NoSQL included. They usually end up reinventing the wheel and re-writing buggy versions of features a RDBMS would have given them natively.

Been there, done that, migrated everything back to Postgres and saw huge gains.
I don't think I've seen a deadlock in a long long time on most major DB platforms.

PG also lets you get very vague about it being an relational DB if you want.

And tbh, if the size of your table impacts performance, you either don't have a very good DBA or your DBA doesn't know what partitioning is, both good reasons to replace them.

Most modern DBs don't have any of these issues, PG can cleanly handle live schema changes since it packs those in transactions. Old transactions simply use the previous schema. MariaDB requires a bit more fiddling but Github figured it out.

And from experience, you're likely not going to hit the scale where you need multiple DB nodes for performance. In 10 out of 10 cases, a simple failover is what you need (but didn't invest in because MongoDB is cooler).

> when one db connection has access to every resource

So why not use db users to restrict each part to only be able to access the parts it should?

Sure that works... I think encapsulation through separate db schemas is generally sufficient. Most people don't start or end up here however. I'm not saying that RDBMS used correctly is a bad thing. I prefer multiple small postgres schemas per "data service" (what I'm calling a service that deals only with data persistence, and updating consumers about changes to data), each schema can correlate to a single resource, or smallest possible domain of the application. These services can publish notifications about updates that can be consumed by consuming downstream services.

It's my opinion micro-services, should do one thing and do them well, and the data storage that backs these services should only be concerned with the domain of that single-purpose service. It should be isolated from all other concerns.

Having a separate schema for "users" than for "messages" for example.

Where to draw those dividing lines is not always easy.