Hacker News new | ask | show | jobs
by barrkel 1867 days ago
I think trying to scale out compute + mutable state across multiple CPUs on a single box has somewhat fallen by the wayside outside of specialized applications like databases and distributed query engines, as a local maximum avoided in favour of scalability via more boxes.

There's several forces behind this. Applications are more likely to be web-based or services - i.e. inherently distributed - and less likely to be desktop or traditional client/server, where almost all compute happened on a single server. As distributed services, maximizing statelessness and avoiding mutable shared memory is key to solving a lot of problems: scaling out (no shared memory), redundancy (keep a second copy of the application logic running somewhere, no sync required), recovery and idempotence (if something fails, try again until it succeeds - repeated attempts are safe).

Reliable persistent queues are part of that. They let you bring services up and down and switch over without down time, or restart after a failure and resume where they left off.

The problems of shared mutable state are best kept in specialized applications: databases, queuing systems, distributed caches, consistent key-value stores. Keeping state consistent in a distributed system is a genuinely hard problem, and STM isn't much help, except perhaps as an implementation detail in some of those specialized applications.

1 comments

For what it's worth, scaling mutable shared state across multiple CPUs on a single box has fallen by the wayside for databases too. Thread-per-core style software architecture has become idiomatic, in part because it efficiently scales up on machines with a very large number of cores. Nothing about this precludes scale-out, it just allows you to serve 10-100x the load on a single box as the more typical naive design.

Web applications aren't special in this regard. We've just normalized being incredibly wasteful of computing resources when designing them, spinning up huge clusters to serve a workload that could be easily satisfied with a single machine (or a few machines for availability).

Premature scale-out, because we've forgotten how to scale-up, is the root of much evil.

This has been my experience as well. I shifted from embedded systems programming to web services world briefly. I found that scale up approach has been so maligned that it is ok to spin up several hundred ec2 instances running a (poorly written) java application instead of a single multi core instance running erlang that performed much better. Some frameworks even ran web services in php and spent all effort in not having the request reach php code since it was doing like 10 reqs/sec.
Not a Java fan, but the single multi core instance being more perform any than a fleet of VMs is likely language independent.