| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ziggity 2444 days ago

And then 3 years later we have [0] and 6 years after that we have [1]. I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact. Whether it's a bug in the actual algorithm or a "Comparison method violates its general contract!" exception that happens once in a blue moon, I think putting TimSort in Java was a mistake.

[0] https://dertompson.com/2012/11/23/sort-algorithm-changes-in-...

[1] https://bugs.openjdk.java.net/browse/JDK-8203864

2 comments

earenndil 2444 days ago

> I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact

I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration. And - consider: if every line of code has a chance of being buggy, then a bigger change is likelier to be more buggy. I say more buggy, rather than have more bugs, because bugginess is rarely disconnected; the whole conception is likely to be more flawed the newer it is. If we were to only accept small contributions then, somewhere along the way from here to the limit, we would arrive at timsort. And all the flaws of timsort wouldn't be gone, they would just be amortized into smaller chunks along the way from here to there. Which may be preferable, but that's something you have to consider; it's not so cut-and-dry as 'if we only make change conservatively, then we avoid catastrophic failure'.

All this not to mention that there are people who are 'masters of the universe' at fixing bugs.

mavelikara 2444 days ago

> I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration.

+1. Bloch himself made the iteration over work by Arthur van Hoff and others at Sun before him.

jonas21 2444 days ago

The "Comparison method violates its general contract!" exception is due to a bug in the caller's code, not in the sort implementation. If you have a bug in your code, just because you're able to get away with it today doesn't mean you should expect to get away with it forever.

ziggity 2444 days ago

I agree in theory. But in practice that specific caller's code bug was so widespread and Timsort broke so much existing code that it warranted offering a "-Djava.util.Arrays.useLegacyMergeSort=true" option.

The specific line from the Javadoc you're alluding to [0] is:

> The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y.

The problem is the second implied half of that statement: Pre-Timsort it was ", and if you don't, results might not be sorted in the correct order" while with Timsort it was ", and if you don't, you will get a RuntimeException."

It's way too easy to accidentally violate that condition: System.currentTimeMillis(), incorrect null handling, random numbers. Sometimes not even in the Comparator itself. The condition I posted, plus the other two:

> The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.

> Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.

Are just way too easy for someone to inadvertently violate to justify a RuntimeException when it occurs.

[0] https://docs.oracle.com/javase/7/docs/api/java/lang/Comparab...

gweinberg 2444 days ago

It sounds like it didn't break code, it just revealed that a lot of code was already broken. An invalid comparison function is a serious problem, and accepting unsorted output rather than fixing the damn bug is not a good workaround.

ignoramous 2444 days ago

Obligatory xkcd: https://xkcd.com/1172 (Emacs Workflow)

buildbot 2444 days ago

It seems way better to fail with a RuntimeException than to get possibly unsorted results that cause some hideous error deeper in your application.

eru 2444 days ago

It depends on the contract.

It's good to have a contract that defines behaviour even in the case of invalid input.

That could be throwing an error, or it could be given unsorted input. Either way is ok.

(The C and C++ answer is to leave the behaviour in the face of invalid input undefined. Undefined means that anything goes, including formatting your hard disk.)

naniwaduni 2444 days ago

If you're willing to accept unsorted output from your sort function, why are you sorting? What are you sorting?

AnimalMuppet 2444 days ago

If you supply an invalid comparison function to your sort function, which would you prefer to happen - that it crash, or that it give you unsorted output? (If you actually wanted sorted output, you should have provided a valid comparison function.)

njharman 2444 days ago

No question, that it crash. Silent buggy behavior is a bitch to notice.

chc 2444 days ago

Ideally I'd like a compile-time error — but that's beyond the current level of technology, so the next best thing is a crash. Having code that does unintended things is the worst-case scenario IMO. (Obviously it's the programmer's fault for providing buggy code, but that's a fault that every programmer shares. Personally, I appreciate any help in guarding against it.)

jiofih 2444 days ago

> beyond the current level of technology

I’m sure there quite a few languages that will catch that at compile time.

estebank 2443 days ago

> but that's beyond the current level of technology

Is it? In Rust and other languages you would use trait bounds to restrict the input types to be comparable with each other, which would make this a compile time error.

naniwaduni 2444 days ago

Crash, so that I actually get a reasonably high chance to notice that there is a bug during testing?

The comparison being invalid is definitely a bug; if I didn't want sorted output, I wouldn't be sorting.

wil421 2444 days ago

Crash so when wil421 sort is implemented in the next version I’m not scratching my head months or years later.

chias 2444 days ago

Personally, I would very much prefer that it crash.

longemen3000 2444 days ago

maybe you shouldn't supply invalid comparisons in the first place. but if you do, crashing seems like an appropriate response

kragen 2444 days ago

Most of the time I want it to crash in development and give me unsorted output in production; debugging an unhandled exception is super easy compared to debugging sometimes incorrectly sorted output. In rare cases I would also want it to crash in production. I definitely don't want it to change between the two scenarios when I update my JVM.

aaaa123 2444 days ago

crash so i know what a fucking stupid mistake I made and thus I can fix it :)