Hacker News new | ask | show | jobs
by dataflow 2286 days ago
I love this. But I feel like there are people (like you and me) who find an explanation like this awesome and compelling, and then there are others who have a hard time getting on the same wavelength, let alone to try to agree. Like for example, a similar sort of reasoning leads to min(x, y) breaking ties in favor of x, but max(x, y) breaking ties in favor of y, e.g. because [min(x, y), max(x, y)] should stably sort the objects. (There were discussions of how C++'s definition is broken a while back, if you Google hard enough.)

But whenever I've tried to explain things like this to people, I've found so many just don't seem to see the issue, or to remotely care if they do. Their intuitive responses tend to be:

- It's an obscure issue, it's never gonna come up.

- Oh, well I mean it's obviously your fault for calling it that way with that implicit assumption. Why would you think that?

- Well it's the caller's responsibility to read the documentation/look at the source/test the code before they use this function, not my problem to try to fit it into every mathematically possible use case.

I feel like the notion of putting some more thought into what seems like an elementary API so that these issues don't come up and trip up users in the first place seems like a completely unjustifiable academic effort to them, if not an outright foreign one. Have you found any effective way to try to communicate stuff like this and hopefully actually convince people?

1 comments

The very concept of "breaking ties" with min/max suggests that something woolly and ill-defined is going on; surely x and y are either equal or they are not?
That's everyone's instinctive reaction, but it's completely wrong, and everyone will know this too, if you just give the right examples. Like say you compare people by their age. That's a perfectly fine operation sufficient for sorting, min, max, etc. Now you find two people have the same age. Are they the same person?

Or let's say you're sorting strings case-insensitively. "a" might be neither less than nor greater than "A", meaning they'd be unordered with respect to each other. But that doesn't mean you suddenly wouldn't care if one got duplicated and the other got deleted.

Fundamentally "unordered" just means "neither comes before nor comes after". It doesn't imply "equal to". You can wrap this in a lot of fancy math terminology about partial and total orders, but it's not that complicated conceptually; the idea is just that the lack of an ordering doesn't imply equality, just like how you and your twin aren't the same person. This is one aspect of why C++ is introducing more sophisticated comparisons in C++20 (I would suggest looking it up).

> Like say you compare people by their age. That's a perfectly fine operation for sorting, min, max, etc. Now you find two people have the same age. Are they the same person?

They have the same age, which might be the minimum age of the group. I don't think it makes sense to talk about the minimum person of the group; minimum/maximum imply that it's the thing itself which is ordered.

The reaction to this point seems to have been dismissive (ironic in a thread that started with a complaint about dismissing small distinctions!), but I think the distinction you're making is an important one. In Haskell, it's the difference between `minimum` and `minimumBy`.

https://hackage.haskell.org/package/base-4.12.0.0/docs/GHC-L...

https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-...

The point about strings is different. It doesn't make sense in general to speak of the minimum string in a list, or, more generally, the minimum of a partially ordered set; but it does make perfectly good sense to sort a partially ordered set, with the idea that there are multiple correct sorts when elements are incomparable. That, of course, is where the notion of a stable sort becomes important.

How about a priority heap? It's much more useful to sort things by their min priority, but just telling me the priority at the end isn't helpful. You'll want the thing as well.

(Even more concretely consider a min heap used in pathfinding, where you store the (distance_so_far, (x,y)) pairs, and they sort by distance_so_far.)

> How about a priority heap? It's much more useful to sort things by their min priority, but just telling me the priority at the end isn't helpful. You'll want the thing as well.

Sure, but again, at that point you're not taking the minimum thing, you're taking the thing with the minimum priority. And it wouldn't be strange to say that two things have the same priority (even if they're different things).

> I don't think it makes sense to talk about the minimum person of the group; minimum/maximum imply that it's the thing itself which is ordered.

Uh, it totally makes sense, I just gave you a string example too. The comparator doesn't have to distinguish unequal elements, and a min (and argmin and sort et al.) is perfectly fine and sensible with such a comparator.

Do you code in Python at all? Do you find the fact that 1.0 and 1 are both simultaneously equal in some sense and unequal in another sense to be just complete nonsense? Do you expect min(1.0, 1) to error...? Or for sorted([1.0, 1]) to return [1, 1] just because the elements are "equal" so obviously what right do you have to care which one is returned?

> Uh, it totally makes sense, I just gave you a string example too. The comparator doesn't have to distinguish unequal elements, and a min (and argmin and sort et al.) is perfectly fine and sensible with such a comparator.

Sorting is not the same thing as taking the minimum.

> Do you code in Python at all? Do you find the fact that 1.0 and 1 are both simultaneously equal in some sense and unequal in another sense to be just complete nonsense? Do you expect min(1.0, 1) to error...?

Yes, and this is a large reason why I prefer to work in languages that let me manage such distinctions more carefully.

Not if you're trying to do a stable sort.

If you've got a list of people that is already sorted by name, and you want to sort them by age but still preserve the name sorting for each age (ie, if Alice and Bob are both 30 years old, I still want Alice to appear before Bob in the list), then having a defined way to break ties greatly simplifies the code.

You wouldn't use min/max as part of sorting though.
min and max can also be comparing one part of a more complex object:

    In [1]: l = [(19, "John"), (19, "Jack"), (20, "Jim")]

    In [2]: min(l, key=lambda x: x[0])
    Out[2]: (19, 'John')