|
|
|
|
|
by Jweb_Guru
3648 days ago
|
|
When it comes to data validation, none of what you just said applies, because you need to perform your work inside a transaction. You can validate the data outside your database and then confirm that nothing's changed (optimistic concurrency control) but you can do that just as easily inside the database, with lower latency and greater throughput (and in situations with lots of contention this can lead to many more aborts than other concurrency control mechanisms, so be careful!) because many databases have OCC built in. If you can afford to relax consistency due to aspects of your data model, you can use a database with a relaxed consistency model and--again--get far better performance than an ad-hoc solution in your application. It's hugely unclear to me why you think you skirting transactional requirements by performing work in your application is less complex than using a NoSQL database (or using a database that utilizes MVCC or can otherwise provide long-lived read snapshots). Frankly, I also disagree that for most websites the bottleneck is the database. For many websites, database latency / throughput constraints don't ever become the dominant factor in end-to-end requests because of all the layers they have to get through in order to get to the database in the first place, combined with a relatively low number of requests per second (commodity relational databases on commodity hardware can easily handle many thousands per second, and IIRC Google Search only had to handle 40k rps from real clients in a recent press release) and inefficient code elsewhere in the stack. |
|
Now it is easy to screw up an application. It is easy to screw up a database design. It is easy to screw up queries and query plans. But all of those are fixable in relatively straightforward ways. And once you do that, you will wind up with database throughput as your bottleneck.
As for NoSQL, the problem is this. Moving to that architecture requires taking an up front hit on transactional complexity, usually requires several times the hardware (data needs to be stored multiple times for hardware failure), puts a lot of stress on your network and latency, and is really easy to screw up. Just using a popular out of the box solution is not enough - see https://aphyr.com/tags/jepsen for a list of real failure modes on stuff that will look perfectly fine in testing.
It is a necessary challenge to accept if you want to go beyond a certain scale. But you should not accept that challenge unless you have good reason to do so.