seems like all databases are moving towards the middle. Postgres has JSON support, MongoDB has transactions and also a columnar extension for OLAP type data. NoSQL seems almost meaningless as a term now. Feels like a move towards a winner takes all multi-modal database that can work with most types of data fairly well. Postgres with all of it's specialized extensions seems like it will be the most popular choice. The convenience of not having to manage multiple databases is hard to beat unless performance is exponentially better, Postgres with these extensions can probably be "good enough" for a lot of companies
reminds me of how industries typically start out dominated by vertically integrated companies, move to specialized horizontal companies, then generally move back to vertical integration due to efficiency. Car industry started this way with Ford, went away from it, and now Tesla is doing it again. Lots of other examples in other industries
The pendulum swing is common in any system, and is a really effective mechanism for evaluation.
You almost always want somewhere in the middle, but it’s often much easier to move back after a large jump in one direction than to push towards the middle.
For documents it made access fast since there’s no joins, etc. that require paging from all over. The problem ended up being updates and compaction issues.
My memory is that the problem was ACID. The document stores didn’t promise to be reliable because apparently that didn’t scale.
And there was a very well known cartoon video discussion about it with “web scale” and “just write to dev null” and other classics that became memes :)
Did you ever read Pat Helland's article, "Life Beyond Distributed Transactions: An apostate’s opinion" https://dl.acm.org/doi/10.1145/3012426.3025012? "This article explores and names some of the practical approaches used in the implementation of large-scale mission-critical applications in a world that rejects distributed transactions."
Admittedly I live in a world where big distributed transactions are a given and work fine and sql speeds us up not slows us down. I’m guessing sql and acid scaled after all?
Yes and no. Distributed transactions and two-phase commit have been superseded by things like Paxos and Raft, with a variety of consistency models, so the implementation is drastically different.
Document stores often are reliable and more fault tolerant. But yes they trade ACID.
There are some applications that require high throughput (usually write) but can be fine with read consistency.
Couple of examples
- consumer facing comment systems where other users are OK to miss your comment by 30 seconds
- timeseries logging where you are usually reading infrequently but writing very much in a denormalized format so joins aren't as critical
reminds me of how industries typically start out dominated by vertically integrated companies, move to specialized horizontal companies, then generally move back to vertical integration due to efficiency. Car industry started this way with Ford, went away from it, and now Tesla is doing it again. Lots of other examples in other industries