Hacker News new | ask | show | jobs
by jfoks 15 days ago
It's not so simple that you can just declare what is surprising. Surprise depends on context, and not everyone will have the same context as you. You say you would expect the term 'sort' to mean a stable sort, and I would expect it to always mean in-place sorting, others may expect it to use the absolute fastest way to get something sorted... Different users will have different priorities and therefore expectations.

Sort stability does not matter when the sorting key is the only thing in your data being sorted. E.g. When I sort my M&Ms by color, I never keep them in the same order, because it doesn't matter. A red M&M is a red M&M. Nobody expects their red M&Ms to remain in the same order after sorting. I do tend to expect my M&M sorting to happen in-place. I expect to not need to provide additional candy holding bowls to later clean up that were used for temporary storage while my M&Ms were getting sorted. But I'll optionally grab additional bowls when I'm in a time crunch it if it speeds things up...

Now, if we're sorting all the cars on a parking lot by color, it may be more important to keep all the red cars in the same order where they started, for example if they were previously already sorted by brand, it /can/ be useful if that is preserved. But it's not guaranteed to be important or useful. Maybe the rich owner just want to torch all their red cars all together. I typically won't have access to additional temporary parking lots only used during sorting, or maybe the owner is coming with a flame thrower at 1PM and it has to be done as fast as possible whatever the cost. There is a tradeoff, rent additional parking lots during sorting, or take more time and do in-place stable sorting, or jumble up the car brands.

So what I want is control. That's all. Whether or not the ambiguous term 'sort' is stable or unstable, or in-place or not is just semantics. The only way to get clarity is to either use prior agreement, or to not use ambiguous names. Maybe a language should ban 'sort', and only allow 'foo_sort_bar' names with stability of memory usage postfixes or prefixes to 'inform' the developer. Neither choice is ideal and will satisfy everyone. It's like being a DJ at a high school party.

I'm not saying that the STL is great in practice, since it appears to optimize for usage flexibility with defined algorithmic and memory complexity at the big-O level, and mostly disregards actual real-life metrics. Arguing, however, that a language or library is better because an objectively ambiguous choice was made differently than your expectations is like arguing for fundamental superiority of either endianness over the other.

1 comments

> It's not so simple that you can just declare what is surprising.

On the contrary, of course I can tell you that I was surprised and I'm far from alone. The fact you immediately grasped for "real world" comparisons ought to tell you that you're not thinking about this correctly because these are software sorts and so have very different affordances than the real world.

The claim that you wanted control doesn't make sense in the context of C++. There are in place stable sorts - the bubble sort you may have seen in class years ago is one, but C++ doesn't promise one in its standard library. However it does provide an unstable sort, which it just names "sort" and that's what I'm pointing at as a problem.

As to the "absolute fastest" you're in the wrong place if you've used a generic comparison sort expecting the "absolute fastest". For the machine integers it's usually not even the correct category of sort for "absolute fastest". But the C++ standard library is the wrong place to look even if you did need a generic comparison sort, because so much crap C++ exists and maintainers are scared to change anything for fear of what may happen.

Did you know libc++ didn't even have a guaranteed O(N log N) sort until the Joe Biden presidency? The introsort paper was written last century and the C++ standard itself did finally incorporate this basic requirement in 2011, but it took another decade for the Clang team to fix this.

Ok, I'll keep it short: I'm far from alone being surprised that a sort allocates temporary memory...

C++ is used by a lot of different people with a lot of different background, and... expectations...

My point is that "sort" is ambiguous and having expectations on ambiguity and arguing that a certain one is better is like arguing little endian being better or worse than bit endian.

> Ok, I'll keep it short: I'm far from alone being surprised that a sort allocates temporary memory...

In a sense I'm sure this is true. C++ programmers routinely report being astonished about all sorts of properties of the language they have previously insisted they know well and who could blame them (for the former, at least).

Again, this is not symmetrical. LE and BE are symmetrical, if you have to pick one there isn't a "safe default" that isn't surprising to people who expected the other one†. In contrast sort stability isn't like that, all stable sorts also meet the criteria for an unstable sort. Likewise all the in-place sorts meet the criteria for an allocating sort.

C++ chooses to offer an unstable sort just named "sort". It doesn't offer a stable in place sort at all, but it does offer a stable allocating sort and names that stable_sort

† But what you can do is where it matters you explicitly offer the LE and BE options and silently whichever is native on your target is fast. Users can write whichever they meant and their program works rather than "Oops, by default on this platform it's the opposite byte order, there's a special conversion function to run". Needless to say C++ doesn't do this either.