Hacker News new | ask | show | jobs
by jiggawatts 786 days ago
This is a generic problem across a vast range of technologies: how to handle scale out replication with delays and have a consistent “streaming” experience.

The main issue is that most query languages like SQL predate HTTP and haven’t been updated with the same concepts such as ETag headers and cookies.

Every database request should include a transaction consistency header that contains a vector clock of logical replication status. Every response should include cache control headers and an updated vector clock. This should then be encrypted and bubbled up through the HTTP pipeline to the end clients.

E.g.: a client should able to request a “snapshot no older then T minutes” for a report run and be able to return a cookie that ensures that subsequent queries remain consistent with that. Most databases can do this… for a single TCP socket connection only. They’ll lose track of the context if the socket is closed or if multiple servers connect from a web farm.

With an approach like this, any cache or any replica could be utilised for any query in a safe and consistent manner.

Essentially, this would be fixing an impedance mismatch between a stateful connection-oriented system and a stateless request-response system.

2 comments

I don't think that works. First off the server doesn't know when the client goes away so has to hold the snapshot forever. Secondly you double your replication latency and have extreme sensitivity to stragglers, unless you take the risk of hitting an unavailable cache.
The database engine would have to be designed for this, and/or the clients could request the level of consistency that they require. Most apps only care about “this age or newer”, except when paging through a data set where consistency matters.

E.g.: indexes can include the timestamp and then queries can filter out new rows implicitly. Physically this can be implemented with tiered indexes where the topmost layer is in-memory only and queries older than what it contains are rejected. The on-disk indexes then don’t need old row versions or timestamps.

nice very clean

never thought of bubbling up the DB Transaction ID up to the cookie layer