Hacker News new | ask | show | jobs
by ComputerGuru 566 days ago
Having a consistent log is sufficient with an atomic compare operation is sufficient for a distributed database but its performance will be extremely questionable. CAS is always the slow step, in this case, pathologically slow. The magic is to do whatever you can to avoid it until absolutely necessary. The availability of consistent, ordered, synchronized timestamps across all nodes is something most distributed databases require as a prerequisite. How you handle violations of that (and to what degree of accuracy you can rely on it) make a considerable difference.

Depending on how you structure the underlying pages, you’ll get to decide how availability at the log level translates to availability in your user/app-facing interface and whether you will end up sacrificing consistency, availability, or partition tolerance.

Basically, S3 with its recent consistency guarantees and all-new CAS support is sufffiicent in-and-of itself. But for anything other than the most basic (least amount of data, lowest frequency writes, etc) you’ll need a considerable amount of magic to make it useable.

The most straightforward approach would be to use the existing whole of another database but swap out the backend and then tweak the frontend accordingly. SQLite lets you use custom vfs providers (already used to provide fairly efficient SQLite over http without serving the entirety of the database, but previously not for writes) and with Postgres you can use foreign data wrappers. But in both cases you’ll basically have to take out a lock to support writes, either on a page or a row (either risk lots of contention or introduce a ton of locking and network latency overhead).