Hacker News new | ask | show | jobs
by pdimitar 9 days ago
Web / API services during bursts. Or just when you _really_ don't want to scale horizontally.

Elixir / Golang can do this very well. And they do. I have supervised, led and authored such projects that are in production to this day.

Rust too but it's lower-level and you kind of have to hand-roll OTP which of course will always fail.

1 comments

from experience, during bursts it's never actual web/api server that is bogged down, it's the downstream io bottlenecks.

if your accepting layer is abstracted away and implemented correctly, there is very little performance difference between different concurrency approaches and all you're exposed to as developer is implementation of your handler functions.

Not the case; good abstractions are valuable, but the performance differences between runtimes are very real.

Take the example of some simple HTTP<->blob store service gets slammed with millions of requests when someone using the API does a backfill via some framework on their end that aggressively scales request volume up and out.

Something like, say, async Python/starlette with a coroutine per request is gonna perform slightly worse than Erlang, which in turn is gonna perform much worse than Go.

You're right that those differences are sometimes marginal when the latency of whatever IO the backend's doing dominates the equation. However, in my experience huge volume surges show issues with the runtime (the thing managing/launching multiplexed request handler routines) or the ecosystem (the backend IO libraries' ability to work with the runtime's IO multiplexing and make things like request coalescing easy or automatic) more often than you'd think.

It really takes surprisingly little volume to cripple a return-hello-world Phoenix app that indirects the "hello world" behind way too much middleware and message passing; it takes even less to kick over, say, a Gunicorn instance returning "hello world" at the bottom of the Django middleware stack. Golang with Gin, on the other hand, is surprisingly hard to cripple in the same way. And I say that as someone who likes Elixir and Python a lot more than I like Go!

Thank you. As a guy who made a career out of Elixir (and begins to regret it recently but oh well) I agree that Elixir's throughput is not amazing. However, it can get very far and we should always optimize for the most common usages.

I've personally rewritten one hobby and one professional projects from Elixir to Golang and loved the result; as you said, extremely difficult to bring down a Golang service to its knees.

One clarification: Phoenix server behind Caddy/nginx fairs better btw. But, details. Your point stands.

I am yet to see a Rust web/API service I wrote to _ever_ buckle under pressure and just crash. It was either an application bug (like the famous Cloudflare's `.unwrap()` error from the last weeks/months) or the Linux OOM killer. Literally never crashed. But I did witness it brutally murder a MySQL cluster because it couldn't serve it fast enough. That was both fun and terrifying to watch on the dashboards.

> I did witness it brutally murder a MySQL cluster because it couldn't serve it fast enough. That was both fun and terrifying to watch on the dashboards.

Haha yep. In my experience, everyone running CGI/process-per-request application servers is bullish on switching to a concurrent or cooperative runtime...until they realize they just removed the primary ratelimiter on downstream DB/service accesses.

The converse war stories are also amusing: people rewrite their whole app in a concurrent/asynchronous framework and nothing changes, because the DB driver is still farming out all queries to a tiny fixed-size threadpool of connections that was the bottleneck all along.

Oh yeah, definitely. If your DB server (or any storage backend) cannot have like 200+ connections alive at all times then it's absolutely pointless rewriting your app in Elixir or Golang. You'll just serve DB timeouts in your responses.
> You're right that those differences are sometimes marginal when the latency of whatever IO the backend's doing dominates the equation. However, in my experience huge volume surges show issues with the runtime (the thing managing/launching multiplexed request handler routines) or the ecosystem (the backend IO libraries' ability to work with the runtime's IO multiplexing and make things like request coalescing easy or automatic) more often than you'd think.

fair enough, although at this point we start talking about LB in front of the thing, consumption mechanics, autoscaling signals

i will still maintain that my simple advice for a dev worrying about scale, is that they should focus their efforts on ensuring downstream IO doesn't get overwhelmed (db read replicas, caching, etc) before optimizing runtime performance or autoscaling out unnecessarily.

> focus their efforts on ensuring downstream IO doesn't get overwhelmed (db read replicas, caching, etc) before optimizing runtime performance or autoscaling out unnecessarily.

All good advice, but the choice of runtime can affect the point at which autoscaling and load balancing even need to enter the conversation at all. Optimizing, say, a mostly in-memory cache service and writing it in Golang may yield results like "we can run a single instance of this and serve three orders of magnitude of business growth; slap it behind a DNSRR or a k8s NodePort for update/replacement/fast failover if it crashes, no complex load balancer needed", where writing the same thing in, say, PHP might require discussing orchestration/load balancing/memory/worker process recycling/autoscaling early on in the service's lifetime. Being able to skip those conversations (entirely or for a long time) is a very significant business benefit.

(Upvoted for a really relevant and valuable refinement to the thread.)

Admittedly that layer is almost always abstracted i.e. AWS / GCP and various other smaller hosted solutions that handle a good chunk of load balancing for us. In that landscape BEAM VM's strengths shine even brighter. I've seen firsthand that you can in fact bring a BEAM VM to its knees if you expose it just like that to the net. It's not pretty. Golang fares a touch better and Rust seems almost immune (provided one does not screw up their caching layer and don't do elementary N+1 query mistakes).