| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by foota 806 days ago

It's often talked about how new sql databases offer better scalability than standard SQL databases, but I think it's maybe sometimes underappreciated how (some, not all) of them are also much simpler in terms of their consistency models.

I'd speculate this is because postgres and friends try to eek out every bit of single node performance (which helps with single row throughout and overall throughout, which is obviously much better for them than newsql) but the scalability of new SQL databases might allow them to prefer easy consistency over single node performance.

Possibly this is also just the passage of time benefiting newer systems.

2 comments

wbl 805 days ago

Read committed is explicitly asking for hard mode. If you want a simple life stick with Serializable as always. It took years before people found anomalies in Repeatable Read in Postgres. This stuff is hard even for world class researchers.

link

paulryanrogers 805 days ago

How does one get serializable in a multi-writer system without a lot more locking and having to retry at app layer?

link

Diggsey 805 days ago

You always need app-level retries for SERIALIZABLE isolation level. You don't need any explicit locking - the database should handle that for you (and in the case of PostgreSQL, locking is not the only tool it uses for avoiding serialization anomalies).

The strategy I use is to keep transactions as small as possible, and have retry functionality built into the transaction abstraction, so the buesiness logic doesn't really need to worry about it. I also explicitly use read-only transactions where possible.

link

mjb 805 days ago

Even more generally, distributed systems can find simpler solutions to things like "raise the throughput ceiling", and "handle disk failure", and "handle power failure" than single-box systems. This is for the simple reason that they have more options: beyond the constraints of a box, resource allocation is more flexible, failures less correlated, etc. That allows modern distributed databases to simply avoid some of the super hard problems that prior databases had to solve. Efficiency is still important, but the thing to optimize is mean system efficiency, not the peak performance of a handful of super hot boxes.

There's also the fact that decades of DB research have brought techniques and approaches that beat old ones, and retrofitting existing systems with them can be hard (e.g. see the efforts to remove some of the sharp edges of PG's MVCC behavior and how hard they've turned out to be).

link