| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brightball 1238 days ago

Developers not knowing something they need to know isn't a case for getting rid of the thing, it's a case for insisting they learn it. That it's been abstracted away by ORMs is a testament to those ORM layers and the consistency of the standard...not an indictment of SQL itself.

Let's talk real use cases for a second.

All of the performance benefits of NoSQL come from key-value retrieval and the ability to shard data due to built in lack of joins.

There's nothing stopping anyone from doing that in a relational database. It's a common pattern. CitusDB even made a PostgreSQL extension that makes this model even easier to use with joins and all the other relational goodies. If you want to setup PostgreSQL with an id field and a JSONB field, you've got the entire NoSQL scaling use case covered without all of the limitations of using a NoSQL only database.

The schema at different layers thing doesn't _really_ work either. If you put the schema in your application code, then you mandate an API layer in addition to your database layer. With schema enforced in the database, multiple code bases can connect to the same database. With the schema in the application layer, now everything has to connect to that application. So not only are you working with the NoSQL database AND defining the schema in the application but now you also have to define an API if you have a reason to split out some portion of application logic to a more specialized languages.

The layers of problems introduced because NoSQL is used on a project are excessive. There's no question that storing document data makes a lot of sense in some circumstances, but you're always better off just storing the portion you need in an appropriate column type (JSONB) rather than forcing your entire system into the issues that come with NoSQL only.

Given the JSON capabilities of modern SQL databases there's almost no reason to start any project with a NoSQL only option.

In the prime day article for example, they're singing the praises of their own SQL offerings too.

> Amazon Aurora – On Prime Day, 5,326 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed 288 billion transactions, stored 1,849 terabytes of data, and transferred 749 terabytes of data.

> Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 105.2 million requests per second.

They don't given an apples to apples numbers comparison here. We don't have a cost comparison of infrastructure for DynamoDB vs Aurora. We don't get to see if "288 billion transactions" are database writes while we only see requests here. We don't get to see whether there have to be so many more requests to Dynamo because you can't get the data as easily in a single query. We don't get to see if it's being used as a cache layer or for session management, which are both great use cases for it.

I mean, Instagram scaled just fine with only PostgreSQL. SQL scales just fine. Some of the bevy of options and tools that you get to work with the data don't scale as well as others, but that's not a case for getting rid of those tools...it's just a case for caching when needed.