Hacker News new | ask | show | jobs
by anon3949494 1502 days ago
After all the chatter this week, I've come to the conclusion that Heroku froze at the perfect time for my 4 person company. All of these so called "features" are exactly what we don't want or need.

1. Multi-region deployment only work if your database is globally distributed too. However, making your database globally distributed creates a set of new problems, most of which take time away from your core business.

2. File persistence is fine but not typically necessary. S3 works just fine.

It's easy to forget that most companies are a handful of people or just solo devs. At the same time, most money comes from the enterprise, so products that reach sufficient traction tend to shift their focus to serving the needs of these larger clients.

I'm really glad Heroku froze when it did. Markets always demand growth at all costs, and I find it incredibly refreshing that Heroku ended up staying in its lane. IMO it was and remains the best PaaS for indie devs and small teams.

10 comments

> Multi-region deployment only work if your database is globally distributed too. However, making your database globally distributed creates a set of new problems, most of which take time away from your core business.

Guess what? fly.io offers a turnkey distributed/replicated Postgres for just this reason. You use an HTTP header to route writes to the region hosting your primary.

https://fly.io/docs/getting-started/multi-region-databases/

You do still need to consider the possibility of read replicas being behind the primary when designing your application. If your design considers that from day 1, I think it takes less away from solving your business problems.

Alternatively, you can also just ignore all the multi-region stuff and deploy to one place, as if it was old-school Heroku :-)

> Guess what? fly.io offers a turnkey distributed/replicated Postgres for just this reason. You use an HTTP header to route writes to the region hosting your primary.

Doesn't this take away a lot of the benefits of global distribution?

For example if you pay Fly hundreds of dollars a month to distribute your small app in a few datacenters around the globe but your primary DB is in California then everyone from the EU is going to have about 150-200ms round trip latency every time you write to your DB because you can't get around the limitations of the speed of light.

Now we're back to non-distributed latency times every time you want to write to the DB which is quite often in a lot of types of apps. If you want to cache mostly static read-only pages at the CDN level you can do this with a number of services.

Fly has about 20 datacenters, hosting a small'ish web app that's distributed across them will be over $200 / month without counting extra storage or bandwidth just for the web app portion. Their pg pricing isn't clear but a fairly small cluster is $33.40 / month for 2GB of memory and 40GB of storage. Based on their pricing page it sounds like that's the cost for 1 datacenter, so if you wanted read-replicas in a bunch of other places it adds up. Before you know it you might be at $500 / month to host something that will have similar latency on DB writes as a $20 / month DigitalOcean server that you self manage, Fly also charges you $2 / month per Let's Encrypt wildcard cert where as that's free from Let's Encrypt directly.

You don’t need to route every write to primary though, but only those writes that have dependencies on other writes. Things like telemetry can be written in edge instances. Depends on your application of course, but in many cases that should be only a tiny fraction of all requests needing redirects to primary.

And why would you get 20 instances, all around the world right out of the gate? 6-7 probably do the job quite well, but maybe you don’t even need that many. Depending on where most of your customers are, you could get good results with 3-4 for most users.

> You don’t need to route every write to primary though, but only those writes that have dependencies on other writes.

Thanks, can you give an example of how that works? Did you write your own fork of Postgres or are you using a third party solution like BDR?

Also do you have a few use cases where you'd want writes being dependent on another write?

> 6-7 probably do the job quite well

You could, let's call it 5.

For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?

I was thinking maybe 5 background workers wasn't necessary but frameworks like Rails will put a bunch of things through a background worker where you would want low latency even if they're happening in the background because it's not only things like sending an email where it doesn't matter if it's delayed for 2 seconds behind the scenes. It's performing various Hotwire Turbo actions which render templates and modify records where you'd want to see those things reflected in the web UI as soon as possible.

> Thanks, can you give an example of how that works?

I just noticed I formulated it wrong, my apologies. What I meant is that the replicating regions don’t need to wait for the primary writes to go through before they respond to clients. They will still be read-only Postgres replicas, and info could be shuttled to primary in a fire-and-forget manner, if that’s an option.

Whenever an instance notices that it‘s not primary, but it is currently dealing with a critical write, it can refuse to handle the request, and return a 409 with the fly-replay header that specifies the primary region. Their infra will replay the original request in the specified region.

> Did you write your own fork of Postgres or are you using a third party solution like BDR?

When using fly.io, the best option would probably be to use their postgres cluster service which supports read-only replicas (can take a few seconds for updates to reach replicas): https://fly.io/docs/getting-started/multi-region-databases/

> For a 2gb set up would that be about $50 for the web app, $50 for the background workers, $160ish for postgres and then $50 for Redis? We're still at $300+?

Maybe. A few thoughts:

- Why would you need 5 web workers, would one running on primary not be ideal? If you need so much compute for background work, then that’s not fly‘s fault, I guess.

- Not sure the Postgres read replicas would need to be as powerful as primary

- Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks

> Why would you need 5 web workers, would one running on primary not be ideal?

It's not ideal due to some frameworks using background jobs to handle pushing events through to your web UI, such as broadcasting changes over websockets with Hotwire Turbo.

The UI would update when that job completes and if you only have 1 worker then it's back to waiting 100-350ms to reach the primary worker to see UI changes based on your location which loses the appeal of global distribution. You might as well consider running everything on 1 DigitalOcean server for 15x less at this point and bypass the idea of global distribution if your goal was to reduce latency for your visitors.

> Crazy idea: Use SQLite (replicated with Litestream) instead of Redis and save 50 bucks

A number of web frameworks let you use Redis as a session, cache and job queue back-end with no alternatives (or having to make pretty big compromises to use a SQL DB as an alternative). Also, Rails depends on Redis for Action Cable, swapping that for SQLite isn't an option.

For low-latency workers like that it might make sense to just run them on the same instance as the web servers.
Does Fly let you run multiple commands in separate Docker images? That's usually the pattern on how to run a web app + worker with Docker, as opposed to creating an init system in Docker and running (2) processes in 1 container (this goes against best practices). The Fly docs only mention the approach of using an init system inside of your image and also tries to talk you into running a separate VM[0] to keep your web app + worker isolated.

In either case I think the price still doubles because both your web app and worker need memory for a bunch of common set ups like Rails + Sidekiq, Flask / Django + Celery, etc..

[0]: https://fly.io/docs/app-guides/multiple-processes/

Why are people tripping over $2/mo ? I don’t get this tight-ass mentality. It’s a rounding error.
It's not the $2/mo at face value. For me it's the idea of them pushing to make paying for SSL certificates the norm again after Let's Encrypt has put in a huge amount of effort to change that field. Not that there's anything wrong with charging for things but charging for a free service rubs me in a weird way, especially since they're using Let's Encrypt.

Using the rounding error logic, how do you feel about companies adding $1.99 "convenience fees" or "administrative fees"?

why spend money when you don’t have to?
One reason is that many people use that as a baseline for how they I'd multi-tenacy. It could be they just proxy resources from their customer down to the infrastructure.
And in today's world all data needs to be federated between national borders anyway. Try doing business in China if the user data isn't stored in China. Or Russia. Or the EU. Modern designs need the data layer to be forked between regions, not replicated, with merges between the forks.

On top of that, most replication systems are brittle and create logistical and administrative headaches. If you can get by with just rsync, do.

Yes, there are hundreds of different ways you could accomplish this. Fly.io is a convenient and easy to use one.
Hot take: if people spent half the energy doing multi-region that they today spend screwing around with Kubernetes, they’d be a hell of a lot more reliable.
I think people misconstrue the benefits of k8s to be related to reliability or similar. Ultimately it's about the API and the consistency and productivity it offers.

For larger teams having a well defined API that delineates applications from infrastructure that doesn't require extreme specialist knowledge (it still requires some specialist knowledge but vastly less than direct manipulation of resources via something like Terraform) is a massive productivity boost.

Of course none of that matters if you have 4 developers like OP but for folks like myself that routinely end up at places with 300+ engineers then it's a huge deal.

> I think people misconstrue the benefits of k8s to be related to reliability or similar. Ultimately it's about the API and the consistency and productivity it offers

I think this is the first time I've heard somebody say one of the benefits of kubernetes was productivity.

Really? I think it's a pretty obvious benefit. If you bundle something into a container, you can probably run it in kubernetes. This uniformity makes it incredibly easy to deploy and scale new applications.
Yeah if you study over it instead of copy pasting stuff from the internet, I find k8s the best thing for my small projects. I only have to setup a simple dockerfile and helm chart and I can run a new service in my cluster on DO, for which they offer free control plane, and not be billed for a completely new app and have to setup all my deps and env vars in a clunky UI. I can setup scaling, ingress easily, the Datadog agent is going to pick it up automatically, I can have services communicating via private dns etc. etc.

I am not an ops guy.

+1 this is what I do as well. If you have any semblance of uniformity in your project folder structure, you can even automate the build/deploy process with a simple shell script/bash function.

Of course, this quickly stops working once your small projects grow to have multiple collaborators, a staging environment, etc. - but at that point you're running a proper business

I think what I've heard is the kubernetes end result is very often a massively overcomplicated infrastructure that nobody understands, that's a constant source of headaches and lost time due to leaky abstractions.

Disclaimer: I've never actually used it myself. That's mostly just what I've read and heard from people who use kubernetes.

Basically depends on expertise. The parent commenter probably comes from a team of good well paid ops engineers who understand and set up k8s well. In any other org it’s the show you describe.
It's like what Hedberg said about rice: k8s is great if you're really hungry and what to host two thousand of something.
The same is true of ECS, but with a much simpler API, much tighter integration with load balancers, a no-charge control plane, and not having to upgrade every cluster every 3-6 months.
I’ve had 2 services running flawlessly in ecs for over a year (with load balancing) without having to touch them. Took me all of 15m to set them up. It’s quite good.
We're running Nomad, but that's just a detail. The great thing for both development teams and the ops team is that the container orchestration starts working as a contract between the teams. This allows building more of a framework in which we can provide some initial starting point for standard tasks, like connections to infrastructure components, migrations in various languages, automated deployments, rollbacks and so on for teams out of the box. With this, product teams can go from an idea to deployed and running code very quickly and confidently.
As someone that was a very active Heroku user for years and then worked there for years: I wouldn't trust it as my host. There is nowhere near enough people maintaining it in order to have confidence it'll run without reliability or security issues. They aren't exactly in a position to retain or attract talent either.

I thought Cedar was going to fall over years ago but ironically I think people migrating off the platform are helping it stay alive.

I’m always confused why edge services are always selling points given point 1. The most basic of backend services won’t be able to completely utilize edge services.
It’s a tremendous latency speed up for read heavy apps that can tolerate eventually consistent read replicas. Any app using a popular sql rdbms likely falls into this category at scale. Any app using a redid cache likely falls into this category at scale.

Also any app that has global clients and terminates ssl likely benefits from edge compute.

Yep, for anyone confused on how this works:

You'd still be sending writes to a single region (leader). If the leader is located across the world from the request's origin, there will be a significant latency. Not to mention you need to wait for that write to replicate across the world before it becomes generally available.

This is the distribute-your-Rails-app-without-making-any-code-changes version of that story. It works great for apps that are 51% or more read heavy. You drop our library in, add a region, and off you go. The library takes care of eventual consistency issues.

HTTP requests that write to the DB are basically the same speed as "Heroku, but in one place". If you're building infrastructure for all the full stack devs you can target, this is a good way to do it.

Distributing write heavy work loads is an application architecture problem. You can do it with something like CockroachDB, but you have to model your data specifically to solve that problem. We have maybe 5 customers who've made that leap.

In our experience, people get a huge boost from read replicas without needing to change their app (or learn to model data for geo-distribution).

It's also trivial to serve read requests from a caching layer or via a CDN. At any sufficient scale, you're probably going to need a CDN anyway, whether your database is replicated or not. You don't want every read to hit your database.
I don't think this is that trivial. I've never seen it done correctly. It typically manifests itself as not being able to read your own writes, and I see this all the time (often from companies that have blog posts about how smart their caching algorithm is). For example, you add something to a list, then you're redirected to the list, and it's not there. Then you press refresh and it is.

I guess that's acceptable because people don't really look for the feedback; why do users add the same thing to the list twice, why does everyone hit the refresh button after adding an item to the list, etc. It's because the bug happens after the user is committed to using your service (contract, cost of switching too high; so you don't see adding the cache layer correspond to "churn"), and that it's annoying but not annoying enough to file a support ticket (so you don't see adding the cache layer correspond to increased support burden).

All I can say is, be careful. I wouldn't annoy my users to save a small amount of money. That the industry as a whole is oblivious to quality doesn't mean that it's okay for you to be oblivious about quality.

(Corollary: relaxing the transactional isolation level on your database to increase performance is very hard to reason about it. Do some tests and your eyes will pop out of your head.)

> It typically manifests itself as not being able to read your own writes

Multi-region databases with read replicas face the same issue

Database read request are not the same as readonly HTTP requests. I am much happier having all requests hit my app process than I am trying to do the CDN dance.

Right now your choices are: run a database in on region and:

1. Use the weird HTTP header based cache API with a boring CDN

2. Write a second, JS based app with Workers or Deno Deploy that can do more sophisticated data caching

3. Just put your database close to users. You can use us for this, or you can use something like Cloud Flare Workers and their databases.

My hot take is: if something like Fly.io had existed in 1998, most developers wouldn't bother with a CDN.

Weirdly, most Heroku developers already don't bother with a CDN. It's an extra layer that's not always worth it.

> It's easy to forget that most companies are a handful of people or just solo devs.

I have the same complaint all the way down to simple sysadmin tasks. Ex: MS365 has a lot of churn on features and changes. It’s like they think everyone has a team of admins for it when in reality a lot of small businesses would be satisfied with a simple, email only product they can manage without help.

I strongly agree with your last paragraph. I used Heroku for my wedding website and I would 100% use it again on a project site.

In about 15 minutes I was able to take my site from localhost to a custom domain with SSL with just a little more than a git push. I can't think of many solutions that are simpler than that.

This is literally Netlify's core offering.
Vercel and Github pages would be better IMO if it's a static site.
If it’s a static site then just dump it in an s3 bucket and be on your way.
Static websites have their own issues but an S3 bucket is probably the worst hosting mechanism for them these days. The other services mentioned are much nicer and easier to deal with.
> 2. File persistence is fine but not typically necessary. S3 works just fine.

I'm so glad you pointed this out. Cloud-native development is an important factor in newly architected systems. Defaulting to an S3 API for persistent I/O brings loads of benefits over using traditional file I/O, and brings significant new design considerations. Until a majority of software developers learn exactly how and why to use these new designs, we'll be stuck with outmoded platforms catering to old designs.

> Multi-region deployment only work if your database is globally distributed too. However, making your database globally distributed creates a set of new problems, most of which take time away from your core business.

I have used multi-region for every production database I've deployed in the last ~8 years, and it took < 10 seconds of extra time. It's a core feature of services like RDS on AWS.

There is a benefit if you're multi-region (but not global) because individual regions go down all the time.

It costs more every month, but if you have a B2B business, it's worth the extra cost.

For RDS, are you talking about multi-region, or multi-AZ? I know the latter is easy, but I don't think the first is, though maybe Aurora makes it easier.
Oh, that's handy. Thanks for sharing!
We’re not “froze”. The last week has been exhausting fud.
That’s a lie. The official freeze for new features started in 2017 and customers can see themselves by reading the changelog.
Even small companies should be multi-region, if they care about uptime.
No, they shouldn't. In many instances it's cheaper to tolerate downtime than to pay to avoid it, especially when there's no SLA involved.
Most of the time. If heroku is having downtime. Then Amazon is having downtime. Then half the internet is down. Let customers know Amazon is down. Sit back and relax.
Uptime isn't an axiom. Most software isn't mission critical and most users won't notice if it's down for 30 minutes once or twice a month, and for everything else we have SLA's to manage professional expectations.
Wow, that's a horrible way of thinking about the user experience. And honestly, I'm not surprised. That's why companies that really care about the user experience will always steal market share from those that don't.
It’s actually small companies that care about user experience that will often make these trade-offs. Less time managing multi-cloud deployments means more time spent building our core product and talking to users.
On the one hand, yeah it sucks. On the other hand, my local ice-cream shop was closed for 30 minutes last week because the owner was doing something and the staff member who was rostered on was out sick. If your online business is at the same level of profit and necessity as an ice-cream shop, it can probably close for 30 minutes once or twice a year.
Very few companies have uptime requirements so critical they can justify this. Small companies with limited ops resources may struggle to make a multi-region setup work more reliably than a single-region one.
Often for small companies with limited resources, the act of trying to make something multi region has the effect of making the overall system less correct and less reliable than just running in a single region.