| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Arelius 2053 days ago

I think you have a good point about sorts, in that case mutation is almost always faster, but it seems to me that a large contributor, is that often much data doesn't need to be moved. So clearly no work is better than some work.

Having said that, I hardly see a sort being the majority of time spent in real-world code.

> Because thread-barriers are no-ops on a SIMD computer

My understanding is that only holds true across the SIMD lane, and the algorithm doesn't hold up so generally to general purpose concurrency. But agreed, a sort is something I'd likely try to optimize on a single core, and express program concurrency outside that specific operation.

> What algorithms are you thinking of where copying is faster than mutating? At best, a copy ties mutation in speed.

My main point is it's actually the case that a copy ties mutation in speed quite often if you structure the flow of your data correctly. If you're operating on a value, writing it somewhere else is often close to free.

Having said that as an example, I've seen filtering a list run faster doing a move, highly dependent on the details of the cache of course.

> that can beat the Knuth Shuffle's O(n) computation complexity.

I haven't considered shuffles specifically. But getting into the rut of complexity evaluation alone dismisses important aspects of the machine and the data you are actually operating on, and can lead you very far from the fastest solution for the problem you are actually trying to solve.

2 comments

dragontamer 2053 days ago

> My main point is it's actually the case that a copy ties mutation in speed quite often if you structure the flow of your data correctly. If you're operating on a value, writing it somewhere else is often close to free.

So, here's something. I think that a copying-based methodology can be fast, but more importantly, a copying-based methodology can be easier to parallelize.

If a mutating state is too hard to parallelize, I think rewriting things to be copy-based, and then using that copy-based methodology as a basis for parallel code, is a good idea.

What's difficult is that "mutating state" is a local maximum so to speak: probably the fastest that any single-thread can get to. From this perspective, finding parallelism from a "higher level" can be more useful than trying to parallelize a well-optimized single-threaded program.

----------

Case in point: Multithreaded producer-consumer queues are difficult to write and even define. But multithreaded producer-producer queues (and consumer-consumer queues) are very easy: just an obvious "atomic_add(tail, 1)" (Producer) and "atomic_subtract(tail, 1)" (for consumers) kind of thing.

I call it a "producer-producer" queue, because you need assurances that no one is consuming from the queue for it to work. But if you synchronize your threads to all copy to a queue (and no one consumes from it), its really fast, and actually very parallelized. Ditto for the reverse (the consume phase).

dragontamer 2053 days ago

> I think you have a good point about sorts, in that case mutation is almost always faster, but it seems to me that a large contributor, is that often much data doesn't need to be moved. So clearly no work is better than some work.

I disagree. Even for a simple situation:

[9 1 2 3 4 5 6 7 8] -> [1 2 3 4 5 6 7 8 9]

All 9 elements have to move to a different memory location!! In fact, assuming a randomly shuffled array with all different values, I'd assume that the most common situation is all elements being forced to move somewhere else. But I don't feel like doing the rigorous math to prove it. I'll so something simpler:

If there are N locations in an array, the probability that a random element is in the right spot is 1/N (I think anyway). With N elements, that would be an average of 1-element in the correct spot (assuming uniform random arrays). There is only one-sorted array, but factorial(n-1) array without any element in the correct spot.

> I haven't considered shuffles specifically. But getting into the rut of complexity evaluation alone dismisses important aspects of the machine and the data you are actually operating on, and can lead you very far from the fastest solution for the problem you are actually trying to solve.

Sure. But Knuth shuffle uses nothing aside from "Swap" instructions and a 32-bit (or 64-bit) RNG. Its really simple and fast. That's why that algorithm has stood the test of ~40 or 50 years of use, its probably the optimal algorithm for shuffling on a single-thread.