Hacker News new | ask | show | jobs
by aredington 4845 days ago
Clojure's immutable data structures allow you to scale out across many cores without needing as rigorous locking semantics as a destructively modified version of the same work would need; copy on write presents no contention until you want the new value to be the canonical value.

Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

Single threaded execution is a computational dead end. If you want to go faster, you have to parallelize, be it on a single system or on a cloud service. Clojure's persistent data structures ease this. That the persistent data structures also have canonical serializations ALSO ease this.

3 comments

Give me a break, we're talking about vectors here.

I like Clojure, spent some time messing around with it a couple years ago and will one day actually use it for something, probably involving complex configuration where code-as-data really shines along with concurrency/performance.

But if you're talking about working a ho-hum vector with 100-10k entries, a linear scan over a mutable, contiguous array will typically be faster than the most clever multithreaded code you can come up with, and take up less of the CPU while it's working. 10 cores are a Bad Idea for that kind of work.

Amdahl's law tells us we should look at larger units of concurrency in our architecture rather than getting all excited about some auto-parallelized map function. At that point, it starts being important how fast the (single-threaded!) individual tasks run.

Well, no. A linear scan over a large memory array is going to crap all over the CPU caches if you have to do it more than once.

Break into blocks < CPU cache size, perform multiple stages on each block.

Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth.

Do immutable data structures data structures really let you scale out so easily? I thought that was something of a myth...
"so easily" is poorly defined, but when you are talking about collection data subject to concurrent modification you have a few options for correctness:

1. Read lock on the data for the time a thread is using it. This ensures that it is not destructively modified while iterating over it. This is a terrible option if you're using any kind of blocking operation during the lifespan of the read lock. The thread who obtains the lock runs quickly without any kind of penalty. After it obtains the lock, which might have taken an extremely long time. Especially if some OTHER thread was blocking with a read lock held.

2. Read lock long enough to copy the data. Work with your copy in isolation. You have to incur the cost of the linear copy, this might or might not be less than the cost of performing the work you actually want to do, but if it's close, your run time just went up 2x.

- Brief caveat: Any explicit locking scheme subjects you to risks from deadlocking. This is where complex multithreaded applications develop explicit locking orders from, and what can make large codebases difficult to work in.

3. Don't read lock, hope for the best. This can work better if you have a means to detect concurrent modification and restart. You might even get the correct behavior for 99% of cases.

4. Work with immutable data structures that cannot be destructively modified out from under you. Immutable data is computationally cheaper to read in the face of parallelism than every other option. It is more expensive to write to. What do your read and write loads look like?

- Also please keep in mind that while Clojure provides immutable persistent data structures out of the box and its reader generates them for all the canonical serializations, it does not take away Java's destructively modifiable types

> Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

And this is, in my opinion, the best possible argument in favor of using Clojure. Off the top of my head, I remember Rich Hickey saying something about how he developed Clojure due to his irritation at working on a huge project that wrote code to handle thousands upon thousands of interconnected nodes. That makes perfect sense.

However... writing a web app with Clojure, at least to me, doesn't.

> However... writing a web app with Clojure, at least to me, doesn't.

Why not? I'm serious here, if you're serving thousands of clients in a web app, why wouldn't you want parallelism? I mean, sure at small loads you don't need it, but what about scaling up?

Also, I find using compojure for web app development to be an absolute dream. I might need to get out more (my day job is java web apps), but I love the ability to iterate rapidly in the repl on a web app without having to restart my JVM.

> I'm serious here, if you're serving thousands of clients in a web app, why wouldn't you want parallelism?

For the same reason you don't do parallel by default for every loop you write in java. It's overkill. You can successfully argue that multiple service calls across a network need to be parallel, but this is relatively little to do with a desire to serve many clients and a lot more to do with responsiveness. (a noble goal)

> but I love the ability to iterate rapidly in the repl on a web app without having to restart my JVM

HTML and mixed services > (any combination of java technologies you can dream up to serve web apps)