Hacker News new | ask | show | jobs
by jitl 4 days ago
you can home tenants in a data center close to them, run a copy of your app in each region including the datastore. keep a central db for accounts, billing, etc but user content is easy enough to shard regionally.

taken to extreme, cloudflare durable objects & workers let you place data very close to a tenant automatically; but you lose total write throughput on top of sqlite.

2 comments

This breaks down when someone goes on holiday to Greece for a week, and the RTT over the airbnb wifi is 5 seconds.

Optimistic updates on the frontend are probably simpler too.

oh for sure you start with client side cache & optimistic updates. but u need low latency / regional backend for multiplayer to feel good. I didn’t realize who i was replying to, aaron is probably one of the few people who think about sync engines more than me. anyways we do both at notion and of course we did local cache first client way before we did multi region at Notion.
But this is kind of meaningless unless the tenants themselves are in one geo. Take linear as an example, this strategy works as long as your company that uses linear is all colocated in one area. As soon as you have remote people it falls apart.
Not necessarily. You can do async replication either at the app level or DB level to other regions

Each individual user is fast due to close geo and everyone else has a small (potentially trivial) lag to see writes.

Not sure if such an architecture is worth the complexity, but it's definitely possible.

Actually such architectures are quite old. Back when I worked at Kmart, they had a store server in the office of every store. The store server would asynchronously sync back to corporate (afaik an overnight cron but I think it could be triggered on command). That was the geographically close "edge" server and the store was the tenant. Most ops were quick. For cross tenant queries, clients maintained a list of store numbers and locations. They did some bit twiddling with the store number to calculate a deterministic IP which went to the store server for that store (tenant discovery). With the server IP they could run remote queries directly at the cost of much higher latency since you had to go back through the corporate S2S VPN to headquarters then to the target store.

As for cross geo, you can have writes always be instantly acknowledged at the closest geo location and immediately available to nearby clients while they get asynchronously replicated in the background. Really you'd only see marginal higher write latency when two people are working at the same time in different geographies. That's partially mitigated with time zones

What you are describing is exactly what sync engines do. You can have replicas on the server or replicas on the clients. The tradeoffs are the same except the client-based replicas can be in memory, accessed synchronously directly on the ui thread. No server latency at all.
But it does mean you gracefully degrade so the majority of the company sees the target latency <100ms and the rest of the company sees "not geo-optimized" latency.
Only in the case where there is such a majority of the company that is tightly geolocated.

Again, AWS latency us-west-1 to us-east-1 is 70ms. That's absolute best case for one round-trip that does absolutely no work. And it's ignoring the case of anyone outside of continental US.

Add in actual server-side work, db interactions, and contention - and you're quickly looking at hundreds of ms.

If your users are truly broadly geodistributed, there's no avoiding hundreds of milliseconds of latency if you want strong consistency. You're fighting the speed of light. You can move the source of truth closer to the majority of users with effort without meaningfully regressing performance for the users who aren't tightly geolocated, so you can treat it as a fairly pure optimization.