Hacker News new | ask | show | jobs
by ziggity 2444 days ago
I agree in theory. But in practice that specific caller's code bug was so widespread and Timsort broke so much existing code that it warranted offering a "-Djava.util.Arrays.useLegacyMergeSort=true" option.

The specific line from the Javadoc you're alluding to [0] is:

> The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y.

The problem is the second implied half of that statement: Pre-Timsort it was ", and if you don't, results might not be sorted in the correct order" while with Timsort it was ", and if you don't, you will get a RuntimeException."

It's way too easy to accidentally violate that condition: System.currentTimeMillis(), incorrect null handling, random numbers. Sometimes not even in the Comparator itself. The condition I posted, plus the other two:

> The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.

> Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.

Are just way too easy for someone to inadvertently violate to justify a RuntimeException when it occurs.

[0] https://docs.oracle.com/javase/7/docs/api/java/lang/Comparab...

3 comments

It sounds like it didn't break code, it just revealed that a lot of code was already broken. An invalid comparison function is a serious problem, and accepting unsorted output rather than fixing the damn bug is not a good workaround.
Obligatory xkcd: https://xkcd.com/1172 (Emacs Workflow)
It seems way better to fail with a RuntimeException than to get possibly unsorted results that cause some hideous error deeper in your application.
It depends on the contract.

It's good to have a contract that defines behaviour even in the case of invalid input.

That could be throwing an error, or it could be given unsorted input. Either way is ok.

(The C and C++ answer is to leave the behaviour in the face of invalid input undefined. Undefined means that anything goes, including formatting your hard disk.)

If you're willing to accept unsorted output from your sort function, why are you sorting? What are you sorting?
If you supply an invalid comparison function to your sort function, which would you prefer to happen - that it crash, or that it give you unsorted output? (If you actually wanted sorted output, you should have provided a valid comparison function.)
No question, that it crash. Silent buggy behavior is a bitch to notice.
Ideally I'd like a compile-time error — but that's beyond the current level of technology, so the next best thing is a crash. Having code that does unintended things is the worst-case scenario IMO. (Obviously it's the programmer's fault for providing buggy code, but that's a fault that every programmer shares. Personally, I appreciate any help in guarding against it.)
> beyond the current level of technology

I’m sure there quite a few languages that will catch that at compile time.

It is possible for a compiler to catch that particular mistake at compile-time. It's also possible for the user to provide a proof to a given compiler that a comparison function is well-formed. But I think that the parent was referring to a function that would check arbitrary comparison functions in java for correctness, without a proof, which is afaik provable impossible.
> but that's beyond the current level of technology

Is it? In Rust and other languages you would use trait bounds to restrict the input types to be comparable with each other, which would make this a compile time error.

How does that prevent you from accidentally making a.compare(b) and b.compare(a) inconsistent on some values?

Making sure you can compare the types is already happening in the compiler, and doesn't find the bug.

Crash, so that I actually get a reasonably high chance to notice that there is a bug during testing?

The comparison being invalid is definitely a bug; if I didn't want sorted output, I wouldn't be sorting.

Crash so when wil421 sort is implemented in the next version I’m not scratching my head months or years later.
Personally, I would very much prefer that it crash.
maybe you shouldn't supply invalid comparisons in the first place. but if you do, crashing seems like an appropriate response
Most of the time I want it to crash in development and give me unsorted output in production; debugging an unhandled exception is super easy compared to debugging sometimes incorrectly sorted output. In rare cases I would also want it to crash in production. I definitely don't want it to change between the two scenarios when I update my JVM.
crash so i know what a fucking stupid mistake I made and thus I can fix it :)