Hacker News new | ask | show | jobs
by danharaj 2879 days ago
> The reason "NoSQL" dbs got popular are because in my experience Monolithic large relational databases are hard to scale.

I've met a lot of people whomst thought they had to scale that big. Very few handled anything that couldn't run off a beefy postgres installation.

The purpose of a system is what it does. People don't use nosql to scale because they don't need to scale, so what does it do? People use nosql to not write schemas. That's what it's for, for the majority of users.

If I need a key value store, I use a key value store. There's no flashy paradigm there. If I need to put a container up on the interwebs, I do it. What's serverless? Nosql is an "idea", "paradigm", "revolution", or at least the branding of one. Just the same, serverless.

I will continue to ignore nosql and serverless.

The industry sure does change, but do you know how much of that is moving in a real direction and how much is a merry-go-round? Let's brand it "Carousel" and raise 10 million. And in 20 years we can talk about serverless being the new hotness, again.

4 comments

> Very few handled anything that couldn't run off a beefy postgres installation.

My impression, from attempting to evangelize scaling "up" before scaling "out" (because it's both cheaper and much lower effort/labor/time) is that vanishingly few programmers have any idea what a "beefy" installation would even look like.

I routinely encounter implicit assumptions (partially driven, these days, anyway, by what VPS and cloud providers off) that the "largest" servers 2U (or 4U, if I'm lucky) and are I/O limited by the number of disks they an hold in their chassis.

Similarly, there seems to be a lack of awareness of just how big main memory can be on a single server, even before paying a price premium for higher-density modules.

Not knowing where the price-performance curve inflection points (for memory and/or CPU) happen to be also seems to be associated with not knowing where the price tops out. It's as if they fear the biggest server they can (and will be forced to) buy will cost a million bucks, rather than $100k.

Scale is not just user load, but also scale of application complexity. In my experience when one db connection has access to every resource, in a complex application, this can lead to some really convoluted queries and make schema changes very difficult because of cross cutting dependencies built into these queries, triggers, procedures... etc. This is forgetting about the issues of deadlocks when you have 80 consuming services and applications you don't even know about are opening up all sorts of transactions. Even just splitting the DB into schemas for each resource domain and limiting access per service can help to avoid this.

Also performance is relative, I've worked on highly trafficked applications that had to support high throughput. I have also worked on applications backed by relational storage where data size and complexity has impacted performance.

> "Scale is not just user load, but also scale of application complexity"

In my experience, when people use NoSQL because "the application is too complex for relational DBs" they tend to make a mess of it, NoSQL included. They usually end up reinventing the wheel and re-writing buggy versions of features a RDBMS would have given them natively.

Been there, done that, migrated everything back to Postgres and saw huge gains.
I don't think I've seen a deadlock in a long long time on most major DB platforms.

PG also lets you get very vague about it being an relational DB if you want.

And tbh, if the size of your table impacts performance, you either don't have a very good DBA or your DBA doesn't know what partitioning is, both good reasons to replace them.

Most modern DBs don't have any of these issues, PG can cleanly handle live schema changes since it packs those in transactions. Old transactions simply use the previous schema. MariaDB requires a bit more fiddling but Github figured it out.

And from experience, you're likely not going to hit the scale where you need multiple DB nodes for performance. In 10 out of 10 cases, a simple failover is what you need (but didn't invest in because MongoDB is cooler).

> when one db connection has access to every resource

So why not use db users to restrict each part to only be able to access the parts it should?

Sure that works... I think encapsulation through separate db schemas is generally sufficient. Most people don't start or end up here however. I'm not saying that RDBMS used correctly is a bad thing. I prefer multiple small postgres schemas per "data service" (what I'm calling a service that deals only with data persistence, and updating consumers about changes to data), each schema can correlate to a single resource, or smallest possible domain of the application. These services can publish notifications about updates that can be consumed by consuming downstream services.

It's my opinion micro-services, should do one thing and do them well, and the data storage that backs these services should only be concerned with the domain of that single-purpose service. It should be isolated from all other concerns.

Having a separate schema for "users" than for "messages" for example.

Where to draw those dividing lines is not always easy.

Very much this. Sooooo many times I hear the cry of "does it scale?" To which I reply, "Does it need to?!"

At my last company we had a developer question scalability constantly despite the fact that the average customer of an instance of our product had about 200 users.

I like to add, "does it need to beyond what's delivered by Moore's Law?" (which I use a metaphor for all increases in computing performance, including I/O, which has, of course, increased at a much slower, but far from zero, pace).

If your CPU utilization from user growth is doubling every 2 years, but so is CPU capacity, then don't worry about it.

> Very few handled anything that couldn't run off a beefy postgres installation.

Beefy postgres would get you to 99.9% availability at best, with pretty bad tail latency and would cost quite a bit to operate. As it turns out, very few can actually live with that. And even infamous MongoDB can do better at this than PostgreSQL. Ignorance simply makes your business less competitive.

> Beefy postgres would get you to 99.9% availability at best

This is just false. Shrug.