| > It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet. There's a very solid solution to this that isn't as widely known as it should be. Read after write consistency is extremely important. If a user makes an edit to their content and then can't see that edit in the next page they load they will assume things are broken, and that the site has lost their content. This is really bad! The best fix for this is to make sure that all reads from that user are directed to the lead database for a short period of time after they make an edit. The Fly replay header is perfect for this. Here's what to do: Any time a user performs a write (which should involve a POST request), set a cookie with a very short time expiry - 5s perhaps, though monitor your worst case replica lag to pick the right value. I have trust issues with clocks in user's browsers, so I like to do this by including a value of the cookie that's the server-time when it should expire. In your application's top-level middleware, look for that cookie. If a user has it and the court time has not been reached yet, send a Fly replay header that internally redirects the request to the lead region. This guarantees that users who have just performed a write won't see stale data from a lagging replica. And the implementation is a dozen or so lines of code. Obviously this won't work for every product - if you're building a chat app where every active user writes to the database every few seconds implementing this will send almost every piece of traffic to your leaders leaving your replicas with not much to do. But if your application fits the common pattern where 95% of traffic are reads and only a small portion of your users are causing writes at any one time I would expect this to be extremely effective. Fly replay headers are explained in detail here: https://fly.io/blog/globally-distributed-postgres/ |
Chris McCord describes how Elixir does that with PostgreSQL here: https://news.ycombinator.com/item?id=31434094
Wikipedia implements this trick on top of PHP and MySQL global transaction IDs (GTIDs) so it definitely scales!