Hacker News new | ask | show | jobs
by dragontamer 2053 days ago
> Really? I mean like pretty much every instruction has separate source and destination addresses/registers, memory caches obviously complicate things, but it seems that processing data from one location to another is the most efficient operation.

I can't think of any algorithm where copying is faster than mutating.

* Chess Bitboards: Make / Unmake is a methodology for a reason.

* Mergesort: In-place mergesort is far faster than copy-merge sort. Its much harder to write in-place mergesort, but it really makes a big difference.

* Quicksort: In-place quicksort is faster than copy quicksort. Every "pure functional" recursive / copy quicksort has been... subpar... in my experience.

* C++11 created a "destructive move", the entire R-value references system, because of the hugely more efficient concept of destructive / mutating moves. Destroying the old data while copying the data to a new location (aka: std::move) is far more efficient than making a safe copy (aka: copy-constructor).

-------

Side note: The SIMD "Mergepath" Mergesort is absolutely brilliant. It uses absolutely no mutexes (and only uses thread-barriers) to guarantee that all writes are independent and correct. Furthermore, "Mergepath" mergesort parallelizes maximally (one-element per processor), while retaining O(n * log(n)) sort speeds.

Because thread-barriers are no-ops on a SIMD computer (like AVX512), the thread-barriers in the MergePath concept are maximally efficient. :-)

-------

What algorithms are you thinking of where copying is faster than mutating? At best, a copy ties mutation in speed. In a huge number of applications, mutation (aka: std::move) just faster.

But std::move cannot be done across multiple threads: that's innately thread-unsafe and must be synchronized (at least for typical objects).

---------------

Or maybe the above is too abstract? Lets go down to a very simple, but specific, algorithm. The Fisher-Yates (aka: Knuth) Shuffle.

https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

As far as I know, there is no copy-based algorithm that can beat the Knuth Shuffle's O(n) computation complexity. It is extremely difficult to write a O(n) speed shuffle without mutating the array in place.

The fastest "copy" methodologies for the Shuffle problem I've seen involved generating a random number for each location, then copy-sorting the results: O (n * log(n)), which is asymptotically more complex than the Fisher Yates shuffle (and therefore: work-inefficient. If you parallelize to O(n*log(n)), there will be a size where the Fisher Yates shuffle is faster).

2 comments

I think you have a good point about sorts, in that case mutation is almost always faster, but it seems to me that a large contributor, is that often much data doesn't need to be moved. So clearly no work is better than some work.

Having said that, I hardly see a sort being the majority of time spent in real-world code.

> Because thread-barriers are no-ops on a SIMD computer

My understanding is that only holds true across the SIMD lane, and the algorithm doesn't hold up so generally to general purpose concurrency. But agreed, a sort is something I'd likely try to optimize on a single core, and express program concurrency outside that specific operation.

> What algorithms are you thinking of where copying is faster than mutating? At best, a copy ties mutation in speed.

My main point is it's actually the case that a copy ties mutation in speed quite often if you structure the flow of your data correctly. If you're operating on a value, writing it somewhere else is often close to free.

Having said that as an example, I've seen filtering a list run faster doing a move, highly dependent on the details of the cache of course.

> that can beat the Knuth Shuffle's O(n) computation complexity.

I haven't considered shuffles specifically. But getting into the rut of complexity evaluation alone dismisses important aspects of the machine and the data you are actually operating on, and can lead you very far from the fastest solution for the problem you are actually trying to solve.

> My main point is it's actually the case that a copy ties mutation in speed quite often if you structure the flow of your data correctly. If you're operating on a value, writing it somewhere else is often close to free.

So, here's something. I think that a copying-based methodology can be fast, but more importantly, a copying-based methodology can be easier to parallelize.

If a mutating state is too hard to parallelize, I think rewriting things to be copy-based, and then using that copy-based methodology as a basis for parallel code, is a good idea.

What's difficult is that "mutating state" is a local maximum so to speak: probably the fastest that any single-thread can get to. From this perspective, finding parallelism from a "higher level" can be more useful than trying to parallelize a well-optimized single-threaded program.

----------

Case in point: Multithreaded producer-consumer queues are difficult to write and even define. But multithreaded producer-producer queues (and consumer-consumer queues) are very easy: just an obvious "atomic_add(tail, 1)" (Producer) and "atomic_subtract(tail, 1)" (for consumers) kind of thing.

I call it a "producer-producer" queue, because you need assurances that no one is consuming from the queue for it to work. But if you synchronize your threads to all copy to a queue (and no one consumes from it), its really fast, and actually very parallelized. Ditto for the reverse (the consume phase).

> I think you have a good point about sorts, in that case mutation is almost always faster, but it seems to me that a large contributor, is that often much data doesn't need to be moved. So clearly no work is better than some work.

I disagree. Even for a simple situation:

[9 1 2 3 4 5 6 7 8] -> [1 2 3 4 5 6 7 8 9]

All 9 elements have to move to a different memory location!! In fact, assuming a randomly shuffled array with all different values, I'd assume that the most common situation is all elements being forced to move somewhere else. But I don't feel like doing the rigorous math to prove it. I'll so something simpler:

If there are N locations in an array, the probability that a random element is in the right spot is 1/N (I think anyway). With N elements, that would be an average of 1-element in the correct spot (assuming uniform random arrays). There is only one-sorted array, but factorial(n-1) array without any element in the correct spot.

> I haven't considered shuffles specifically. But getting into the rut of complexity evaluation alone dismisses important aspects of the machine and the data you are actually operating on, and can lead you very far from the fastest solution for the problem you are actually trying to solve.

Sure. But Knuth shuffle uses nothing aside from "Swap" instructions and a 32-bit (or 64-bit) RNG. Its really simple and fast. That's why that algorithm has stood the test of ~40 or 50 years of use, its probably the optimal algorithm for shuffling on a single-thread.

Going off-topic, but note: C++11 moves are non-destructive.

    unique_ptr<Blah> foo = make_unique(...);
    unique_ptr<Blah> bar = std::move(foo); 
    // foo is now effectively destroyed as part of the move;

    foo->something(); // Will error, probably seg-fault. Foo is now "empty"
The value foo used to point at is preserved. But foo itself was destroyed from the std::move(foo); The foo->something() is almost certainly a null-pointer exception or seg-fault in any sane implementation of C++11.

---------

Example 2:

    int main(){
      vector<int> foo = {1, 2, 3, 4, 5};
      vector<int> bar = std::move(foo);

      cout << foo.size() << endl;
      cout << bar.size() << endl;

      return 0;
    }
Notice: foo.size() is 0. The foo-vector was DESTROYED by the std::move(foo);.
unique_ptr's destructor happens to do nothing in a moved-from state. The fact that some types behave this way does not mean moves are destructive. They're still non-destructive. Indeed you're still supposed to call the destructor after a move, and the destructor very well _could_ do something in general. It's just that it doesn't happen to make any difference for typical implementations of some types like unique_ptr.

In fact there are people who have wanted destructive moves in C++, because they found the current non-destructive moves inadequate, but I don't believe the feature has ever been added. Look up destructive moves to find discussions on the issue.

I guess I'm being imprecise with terminology then. It seems like Rust has a concept called destructive move, which is not what I'm talking about.

In any case, the "foo" vector, and "foo" unique_ptr are _mutilated_ as part of the std::move operation. (Since the word "destroyed" seems to have been taken by the Rust community, I can't use that word anymore... hopefully no other community has taken the word "mutilated")

I think "moved-from" already captures what you are trying to say.

The theoretical destructive move is understood not leave an object behind at all, i.e. nothing to call a destructor on.