Hacker News new | ask | show | jobs
by nickjj 1502 days ago
> Guess what? fly.io offers a turnkey distributed/replicated Postgres for just this reason. You use an HTTP header to route writes to the region hosting your primary.

Doesn't this take away a lot of the benefits of global distribution?

For example if you pay Fly hundreds of dollars a month to distribute your small app in a few datacenters around the globe but your primary DB is in California then everyone from the EU is going to have about 150-200ms round trip latency every time you write to your DB because you can't get around the limitations of the speed of light.

Now we're back to non-distributed latency times every time you want to write to the DB which is quite often in a lot of types of apps. If you want to cache mostly static read-only pages at the CDN level you can do this with a number of services.

Fly has about 20 datacenters, hosting a small'ish web app that's distributed across them will be over $200 / month without counting extra storage or bandwidth just for the web app portion. Their pg pricing isn't clear but a fairly small cluster is $33.40 / month for 2GB of memory and 40GB of storage. Based on their pricing page it sounds like that's the cost for 1 datacenter, so if you wanted read-replicas in a bunch of other places it adds up. Before you know it you might be at $500 / month to host something that will have similar latency on DB writes as a $20 / month DigitalOcean server that you self manage, Fly also charges you $2 / month per Let's Encrypt wildcard cert where as that's free from Let's Encrypt directly.

4 comments

You don’t need to route every write to primary though, but only those writes that have dependencies on other writes. Things like telemetry can be written in edge instances. Depends on your application of course, but in many cases that should be only a tiny fraction of all requests needing redirects to primary.

And why would you get 20 instances, all around the world right out of the gate? 6-7 probably do the job quite well, but maybe you don’t even need that many. Depending on where most of your customers are, you could get good results with 3-4 for most users.

> You don’t need to route every write to primary though, but only those writes that have dependencies on other writes.

Thanks, can you give an example of how that works? Did you write your own fork of Postgres or are you using a third party solution like BDR?

Also do you have a few use cases where you'd want writes being dependent on another write?

> 6-7 probably do the job quite well

You could, let's call it 5.

For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?

I was thinking maybe 5 background workers wasn't necessary but frameworks like Rails will put a bunch of things through a background worker where you would want low latency even if they're happening in the background because it's not only things like sending an email where it doesn't matter if it's delayed for 2 seconds behind the scenes. It's performing various Hotwire Turbo actions which render templates and modify records where you'd want to see those things reflected in the web UI as soon as possible.

> Thanks, can you give an example of how that works?

I just noticed I formulated it wrong, my apologies. What I meant is that the replicating regions don’t need to wait for the primary writes to go through before they respond to clients. They will still be read-only Postgres replicas, and info could be shuttled to primary in a fire-and-forget manner, if that’s an option.

Whenever an instance notices that it‘s not primary, but it is currently dealing with a critical write, it can refuse to handle the request, and return a 409 with the fly-replay header that specifies the primary region. Their infra will replay the original request in the specified region.

> Did you write your own fork of Postgres or are you using a third party solution like BDR?

When using fly.io, the best option would probably be to use their postgres cluster service which supports read-only replicas (can take a few seconds for updates to reach replicas): https://fly.io/docs/getting-started/multi-region-databases/

> For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?

Maybe. A few thoughts:

- Why would you need 5 web workers, would one running on primary not be ideal? If you need so much compute for background work, then that’s not fly‘s fault, I guess.

- Not sure the Postgres read replicas would need to be as powerful as primary

- Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks

> Why would you need 5 web workers, would one running on primary not be ideal?

It's not ideal due to some frameworks using background jobs to handle pushing events through to your web UI, such as broadcasting changes over websockets with Hotwire Turbo.

The UI would update when that job completes and if you only have 1 worker then it's back to waiting 100-350ms to reach the primary worker to see UI changes based on your location which loses the appeal of global distribution. You might as well consider running everything on 1 DigitalOcean server for 15x less at this point and bypass the idea of global distribution if your goal was to reduce latency for your visitors.

> Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks

A number of web frameworks let you use Redis as a session, cache and job queue back-end with no alternatives (or having to make pretty big compromises to use a SQL DB as an alternative). Also, Rails depends on Redis for Action Cable, swapping that for SQLite isn't an option.

For low-latency workers like that it might make sense to just run them on the same instance as the web servers.
Does Fly let you run multiple commands in separate Docker images? That's usually the pattern on how to run a web app + worker with Docker, as opposed to creating an init system in Docker and running (2) processes in 1 container (this goes against best practices). The Fly docs only mention the approach of using an init system inside of your image and also tries to talk you into running a separate VM[0] to keep your web app + worker isolated.

In either case I think the price still doubles because both your web app and worker need memory for a bunch of common set ups like Rails + Sidekiq, Flask / Django + Celery, etc..

[0]: https://fly.io/docs/app-guides/multiple-processes/

It's interesting that their bash init uses fg %1. That may return only on the first process changing state, rather than either process exiting. It should probably use this instead:

  #!/usr/bin/env bash
  /app/server &
  /app/server -bar &
  wait -f -n -p app ; rc=$?
  printf "%s: Application '%s' exited: status '%i'\n" "$0" "$app" "$rc"
  exit $rc
That looks a million times better than the horrible hack I wrote. Do you want credit for it when I fix the doc?
It sounds like you're asking if we offer some alternative between running multiple processes in a VM, and running multiple VMs for multiple processes. What's the third option you're looking for? Are you asking if you can run Docker inside a VM, and parcel that single VM out that way? You've got root in a full-fledged Linux VM, so you can do that.
> Are you asking if you can run Docker inside a VM, and parcel that single VM out that way? You've got root in a full-fledged Linux VM, so you can do that.

On a single server VPS I'd use Docker Compose and up the project to run multiple containers.

On a multi-server set up I'd use Kubernetes and set up a deployment for each long running container.

On Heroku I'd use a Procfile to spin up web / workers as needed.

The Fly docs say if you have 1 Docker image you need to run an init system in the Docker image and manage that in your image, it also suggests not using 2 processes in 1 VM and recommends spinning up 1 VM per process.

I suppose I was looking for an easy solution to run multiple processes in 1 VM (in this case multiple Docker containers). The other 3 solutions are IMO easy because once you learn how they work you depend on the happy path of those tools using the built in mechanisms they support. In the Fly case, not even the docs cover how to do it other than rolling your own init system in Docker.

If you have root, can I run docker-compose up in a Fly VM? Will it respect things like graceful timeouts out of the box? Does it support everything Docker Compose supports in the context of that single VM?

Not sure how Ruby works, but can you not run the workers and the web server in the same process? In our Node.js apps, this is as simple as importing a function and calling it.
Most of the popular background workers in Ruby run as a separate process (Sidekiq, Resque, GoodJob). The same goes for using Celery with Python. I'm not sure about PHP but Laravel's docs mention running a separate command for the worker so I'm guessing that's also a 2nd process.

It's common to separate them due to either language limitations or to let you individually scale your workers vs your web apps since in a lot of cases you might be doing a lot of computationally intensive work in the workers and need more of them vs your web apps. Not just more in number of replicas but potentially a different class of compute resources too. Your wep apps might be humming along with a consistent memory / CPU usage but your workers might need double or triple the memory and better cpus.

Why are people tripping over $2/mo ? I don’t get this tight-ass mentality. It’s a rounding error.
It's not the $2/mo at face value. For me it's the idea of them pushing to make paying for SSL certificates the norm again after Let's Encrypt has put in a huge amount of effort to change that field. Not that there's anything wrong with charging for things but charging for a free service rubs me in a weird way, especially since they're using Let's Encrypt.

Using the rounding error logic, how do you feel about companies adding $1.99 "convenience fees" or "administrative fees"?

why spend money when you don’t have to?
One reason is that many people use that as a baseline for how they I'd multi-tenacy. It could be they just proxy resources from their customer down to the infrastructure.
And in today's world all data needs to be federated between national borders anyway. Try doing business in China if the user data isn't stored in China. Or Russia. Or the EU. Modern designs need the data layer to be forked between regions, not replicated, with merges between the forks.

On top of that, most replication systems are brittle and create logistical and administrative headaches. If you can get by with just rsync, do.

Yes, there are hundreds of different ways you could accomplish this. Fly.io is a convenient and easy to use one.