|
But you just asserted the same point I was: No one knows it anymore! That makes the skill valuable on the short-term, but on the long-term you've got the next generation of programmers entering jobs, being given the opportunity to build and change systems, and what are they going to pick? Maybe not SQL. Its the same argument, just at a grander scale, for the generation before us and the COBOL mainframe programmers of old; database technologies just tend to move slower because data is very sticky. You might not see the use-cases; but Apple does. Their usage of Cassandra sustains in the millions of QPS across over 300,000 instances. That's a scale SQL can't even begin to dream of. Discord does; they moved to Scylla [2]. Amazon does [3]. Blackrock Aladdin depends on Cassandra, and manages over $21T in assets [4]. Which kind of circles around to the fact that, while you say "a lack of SQL knowledge is what causes unwarranted griping about SQL", sure I can accept that, but I'd conversely argue that a lack of Solving Real Problems is what causes unwarranted griping about NoSQL. Schematization isn't a real problem; we have a dozen ways of solving that at different layers. Imperfect ACID compliance isn't a real problem; its a pseudo-academic expression of the guarantees that SQL offers, which then gets unfairly transplanted onto NoSQL databases as if those issues can't or wouldn't be addressed elsewhere. Even original article statements like data being inherently relational; data isn't anything, its just data, it just appears relational because its how we've been trained to think about data after 60 years of SQL-monoculture. If you start thinking about data end-to-end, how its displayed to users, on pages, components, screens, feeds, it really doesn't look relational at all; life isn't a graph, the graph is an expression of human pattern-matching on things that could reasonably be related, but in reality aren't; life is a series of snapshots. This moment, this page, this feed, this component, in this moment, for this request, for this session; that is the end-to-end view of data; documents, not a web. But SQL is fine at that! In fact, its a best option at medium scales; the middle of the bell curve where 80% of people live. It may not always be, but its there today. Its that bottom 10% and top 10% where the solutions that work for most don't work as well; and I think NoSQL gets a lot of Majority Hate despite being an extremely strong solution at those scales. Know your tools, and be willing to replace them when the circumstances change. [1] https://twitter.com/erickramirezau/status/157806381149547724... [2] https://www.scylladb.com/press-release/discord-chooses-scyll... [3] https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-f... [4] https://www.youtube.com/watch?v=322GytEo_fE |
Let's talk real use cases for a second.
All of the performance benefits of NoSQL come from key-value retrieval and the ability to shard data due to built in lack of joins.
There's nothing stopping anyone from doing that in a relational database. It's a common pattern. CitusDB even made a PostgreSQL extension that makes this model even easier to use with joins and all the other relational goodies. If you want to setup PostgreSQL with an id field and a JSONB field, you've got the entire NoSQL scaling use case covered without all of the limitations of using a NoSQL only database.
The schema at different layers thing doesn't _really_ work either. If you put the schema in your application code, then you mandate an API layer in addition to your database layer. With schema enforced in the database, multiple code bases can connect to the same database. With the schema in the application layer, now everything has to connect to that application. So not only are you working with the NoSQL database AND defining the schema in the application but now you also have to define an API if you have a reason to split out some portion of application logic to a more specialized languages.
The layers of problems introduced because NoSQL is used on a project are excessive. There's no question that storing document data makes a lot of sense in some circumstances, but you're always better off just storing the portion you need in an appropriate column type (JSONB) rather than forcing your entire system into the issues that come with NoSQL only.
Given the JSON capabilities of modern SQL databases there's almost no reason to start any project with a NoSQL only option.
In the prime day article for example, they're singing the praises of their own SQL offerings too.
> Amazon Aurora – On Prime Day, 5,326 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed 288 billion transactions, stored 1,849 terabytes of data, and transferred 749 terabytes of data.
> Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 105.2 million requests per second.
They don't given an apples to apples numbers comparison here. We don't have a cost comparison of infrastructure for DynamoDB vs Aurora. We don't get to see if "288 billion transactions" are database writes while we only see requests here. We don't get to see whether there have to be so many more requests to Dynamo because you can't get the data as easily in a single query. We don't get to see if it's being used as a cache layer or for session management, which are both great use cases for it.
I mean, Instagram scaled just fine with only PostgreSQL. SQL scales just fine. Some of the bevy of options and tools that you get to work with the data don't scale as well as others, but that's not a case for getting rid of those tools...it's just a case for caching when needed.