And then 3 years later we have [0] and 6 years after that we have [1]. I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact. Whether it's a bug in the actual algorithm or a "Comparison method violates its general contract!" exception that happens once in a blue moon, I think putting TimSort in Java was a mistake.
> I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact
I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration. And - consider: if every line of code has a chance of being buggy, then a bigger change is likelier to be more buggy. I say more buggy, rather than have more bugs, because bugginess is rarely disconnected; the whole conception is likely to be more flawed the newer it is. If we were to only accept small contributions then, somewhere along the way from here to the limit, we would arrive at timsort. And all the flaws of timsort wouldn't be gone, they would just be amortized into smaller chunks along the way from here to there. Which may be preferable, but that's something you have to consider; it's not so cut-and-dry as 'if we only make change conservatively, then we avoid catastrophic failure'.
All this not to mention that there are people who are 'masters of the universe' at fixing bugs.
> I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration.
+1. Bloch himself made the iteration over work by Arthur van Hoff and others at Sun before him.
The "Comparison method violates its general contract!" exception is due to a bug in the caller's code, not in the sort implementation. If you have a bug in your code, just because you're able to get away with it today doesn't mean you should expect to get away with it forever.
I agree in theory. But in practice that specific caller's code bug was so widespread and Timsort broke so much existing code that it warranted offering a "-Djava.util.Arrays.useLegacyMergeSort=true" option.
The specific line from the Javadoc you're alluding to [0] is:
> The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y.
The problem is the second implied half of that statement: Pre-Timsort it was ", and if you don't, results might not be sorted in the correct order" while with Timsort it was ", and if you don't, you will get a RuntimeException."
It's way too easy to accidentally violate that condition: System.currentTimeMillis(), incorrect null handling, random numbers. Sometimes not even in the Comparator itself. The condition I posted, plus the other two:
> The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.
> Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.
Are just way too easy for someone to inadvertently violate to justify a RuntimeException when it occurs.
It sounds like it didn't break code, it just revealed that a lot of code was already broken. An invalid comparison function is a serious problem, and accepting unsorted output rather than fixing the damn bug is not a good workaround.
It's good to have a contract that defines behaviour even in the case of invalid input.
That could be throwing an error, or it could be given unsorted input. Either way is ok.
(The C and C++ answer is to leave the behaviour in the face of invalid input undefined. Undefined means that anything goes, including formatting your hard disk.)
If you supply an invalid comparison function to your sort function, which would you prefer to happen - that it crash, or that it give you unsorted output? (If you actually wanted sorted output, you should have provided a valid comparison function.)
Ideally I'd like a compile-time error — but that's beyond the current level of technology, so the next best thing is a crash. Having code that does unintended things is the worst-case scenario IMO. (Obviously it's the programmer's fault for providing buggy code, but that's a fault that every programmer shares. Personally, I appreciate any help in guarding against it.)
> but that's beyond the current level of technology
Is it? In Rust and other languages you would use trait bounds to restrict the input types to be comparable with each other, which would make this a compile time error.
Most of the time I want it to crash in development and give me unsorted output in production; debugging an unhandled exception is super easy compared to debugging sometimes incorrectly sorted output. In rare cases I would also want it to crash in production. I definitely don't want it to change between the two scenarios when I update my JVM.
I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration. And - consider: if every line of code has a chance of being buggy, then a bigger change is likelier to be more buggy. I say more buggy, rather than have more bugs, because bugginess is rarely disconnected; the whole conception is likely to be more flawed the newer it is. If we were to only accept small contributions then, somewhere along the way from here to the limit, we would arrive at timsort. And all the flaws of timsort wouldn't be gone, they would just be amortized into smaller chunks along the way from here to there. Which may be preferable, but that's something you have to consider; it's not so cut-and-dry as 'if we only make change conservatively, then we avoid catastrophic failure'.
All this not to mention that there are people who are 'masters of the universe' at fixing bugs.