Hacker News new | ask | show | jobs
by simonw 1498 days ago
> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.

There's a very solid solution to this that isn't as widely known as it should be.

Read after write consistency is extremely important. If a user makes an edit to their content and then can't see that edit in the next page they load they will assume things are broken, and that the site has lost their content. This is really bad!

The best fix for this is to make sure that all reads from that user are directed to the lead database for a short period of time after they make an edit.

The Fly replay header is perfect for this. Here's what to do:

Any time a user performs a write (which should involve a POST request), set a cookie with a very short time expiry - 5s perhaps, though monitor your worst case replica lag to pick the right value.

I have trust issues with clocks in user's browsers, so I like to do this by including a value of the cookie that's the server-time when it should expire.

In your application's top-level middleware, look for that cookie. If a user has it and the court time has not been reached yet, send a Fly replay header that internally redirects the request to the lead region.

This guarantees that users who have just performed a write won't see stale data from a lagging replica. And the implementation is a dozen or so lines of code.

Obviously this won't work for every product - if you're building a chat app where every active user writes to the database every few seconds implementing this will send almost every piece of traffic to your leaders leaving your replicas with not much to do.

But if your application fits the common pattern where 95% of traffic are reads and only a small portion of your users are causing writes at any one time I would expect this to be extremely effective.

Fly replay headers are explained in detail here: https://fly.io/blog/globally-distributed-postgres/

2 comments

There's another, more sophisticated trick that works for some databases: tracking a global transaction counter of some sort, persisting that in a cookie when a user makes a write and redirecting the user to the lead database if the replica they are talking to hasn't made it to that point yet.

Chris McCord describes how Elixir does that with PostgreSQL here: https://news.ycombinator.com/item?id=31434094

Wikipedia implements this trick on top of PHP and MySQL global transaction IDs (GTIDs) so it definitely scales!

Actually the way Wikipedia works is slightly different: they don't redirect to a lead database, they instead call this MySQL function to wait on the replica for it to catch up:

    SELECT WAIT_FOR_EXECUTED_GTID_SET($gtidArg, $timeout)
https://github.com/wikimedia/mediawiki/blob/434c333d9b2be817...

I wonder if there's a PostgreSQL equivalent of this?

Looks like someone proposed a WAIT FOR feature for PostgreSQL a couple of years ago: https://www.postgresql.org/message-id/flat/69a363498b76cd079...
(Disclaimer: Not an expert.. just sharing something I read somewhere)

I think FoundationDB does something really interesting with this problem. When you make changes, you do it via a transaction. But all the client reads are using the previous version, until the transaction changes have propagated across the nodes, then the new value is returned.

This is a ton of effort to save the RTT of sending all the requests to a central server. And it all goes out the window the second you need to call an external API in the processing of your requests. And to get what benefit there may be you need to, more or less, pay for a server in every big city. IMHO, outside of gaming there's no real need for what fly.io does.

For something like this to be useful I think the code would need to be running on the user's network. That would drop server ping to sub 1 ms and open up a whole lot of interesting possibilities. But I don't see what changing server ping from 80 ms to 15ms gets me.

This trick isn't just about geographic distribution - it's most commonly used for classic horizontal scaling, where you use multiple read-replicas to handle more traffic.
If you're getting 80ms response times for user requests, consistently, then it doesn't change much.
80ms is the network latency not response times. That's the number fly.io can change and realistic best case is going from 10-15ish ms staying within a city vs 80ms going to a server on the other side of the US.
I'm just saying, if your application is already fast for your users --- anything in the ballpark of 80ms is fast enough --- geographically distributing it might not make a big difference. I'm agreeing with the comment (or at least, its subtext).
If all of your users are in the US you won't gain much from geographical distribution. Where this gets really interesting is when you have users all around the world.