Hacker News new | ask | show | jobs
by jasonwatkinspdx 1872 days ago
Well, I'd say it's a bit muddled. Pure STM systems have seen limited success, mostly on the jvm, particularly clojure. But as a technology it's not spread widely in the industry, despite the fundamental advantage of composability. I personally would attribute this to a sort of gulf created by two common approaches:

1. It's become very common to structure software as a stateless service cluster backed by a stateful database technology. Most of the databases used in this approach would not be described as STM, but the net effect to the application is similar, even if involving more layers and complexity.

2. People have gotten pretty comfortable with simple mutex patterns connected by queues of various sorts. This is a sort of worse is better situation, where the simplicity and high performance of a mutex protected whatever in a toy test or microbenchmark is far more efficient than STM. However, a system composed of many of these mutex protected islands proves a whole different beast indeed. STM has largely been criticized from the perspective of the former, vs the latter.

There are many people who have made the observation that transactional concurrency control, state of the art garbage collection, and even file systems have been converging on similar features. This is being driven by the common constrains on both sides of what hardware and humans expect. In particular with persistent memory, I think you'll see all three of these unify into a single design, because systems that attempt to solve these problems separately will have very inferior match to the hardware.

3 comments

I think trying to scale out compute + mutable state across multiple CPUs on a single box has somewhat fallen by the wayside outside of specialized applications like databases and distributed query engines, as a local maximum avoided in favour of scalability via more boxes.

There's several forces behind this. Applications are more likely to be web-based or services - i.e. inherently distributed - and less likely to be desktop or traditional client/server, where almost all compute happened on a single server. As distributed services, maximizing statelessness and avoiding mutable shared memory is key to solving a lot of problems: scaling out (no shared memory), redundancy (keep a second copy of the application logic running somewhere, no sync required), recovery and idempotence (if something fails, try again until it succeeds - repeated attempts are safe).

Reliable persistent queues are part of that. They let you bring services up and down and switch over without down time, or restart after a failure and resume where they left off.

The problems of shared mutable state are best kept in specialized applications: databases, queuing systems, distributed caches, consistent key-value stores. Keeping state consistent in a distributed system is a genuinely hard problem, and STM isn't much help, except perhaps as an implementation detail in some of those specialized applications.

For what it's worth, scaling mutable shared state across multiple CPUs on a single box has fallen by the wayside for databases too. Thread-per-core style software architecture has become idiomatic, in part because it efficiently scales up on machines with a very large number of cores. Nothing about this precludes scale-out, it just allows you to serve 10-100x the load on a single box as the more typical naive design.

Web applications aren't special in this regard. We've just normalized being incredibly wasteful of computing resources when designing them, spinning up huge clusters to serve a workload that could be easily satisfied with a single machine (or a few machines for availability).

Premature scale-out, because we've forgotten how to scale-up, is the root of much evil.

This has been my experience as well. I shifted from embedded systems programming to web services world briefly. I found that scale up approach has been so maligned that it is ok to spin up several hundred ec2 instances running a (poorly written) java application instead of a single multi core instance running erlang that performed much better. Some frameworks even ran web services in php and spent all effort in not having the request reach php code since it was doing like 10 reqs/sec.
Not a Java fan, but the single multi core instance being more perform any than a fleet of VMs is likely language independent.
SPJ of Haskell fame once discussed perceived STM failure in .Net: https://haskell-cafe.haskell.narkive.com/t6LSdcoE/is-there-a...

Basically, poor STM performance in .Net is too much mutation and mutation-induced logging. This is why clojure's STM is much more widely used.

I have particularly deep experience in various Haskell projects and I must insist that no relatively (1) big Haskell project (2) with a hint of concurrency and (3) developed by several people can go without use of STM.

I can use MVars myself for problem space discovery. When there are at least two of use, I will use TVar and insist others should too.

I think (1) is really about where you land it.

I am not free to name the product, but I worked on a product that has done billions of dollars in revenue that designed a bunch of state management checkpointing using a custom STM implementation for C (yes, C). It made so, so many things straightforward in terms of corner cases; we also extended transactionality to sync over the wire in some cases.

I also think STM-adjacent designs are gaining some traction - for example, FoundationDB looks an awful lot like an STM system from the point of view of applications, much more than it looks like a traditional database.