Hacker News new | ask | show | jobs
by froh 1927 days ago
If I read the paper correctly, then it compares three-way comparison with two two-way comparisons, for a recursively defined (tree) data type.

The paper points out that the convenience of just defining "less than", and heaving "equals" derived from that can be costly. Specifically, for the recursively defined data structure (tree), a three-way comparison which is derived canonically from the two way compare seems to entail not a linear but an quadratic number of comparisons.

What I don't understand is what is happening in 'lt2'.

this is what I'd expect for __lt__

  lt(a, b) also known as (a __lt__ b) is returning
    True, iff a < b, elementwise for lists
    False otherwise
  for same length lists.
I also do understand cmp2.

  (a __eq__ b) iff not (a __lt__ b)  and not (b __lt__ a)
so looking at

  cmp(a, b) = lt(a, b) - lt(b, a)
I get

  a < b: 1 - 0 ==> 1
  b < a: 0 - 1 ==> -1
  a == b: 0 - 0 ==> 0
 
which makes sense.

Now two questions arise with respect to the presented hypothesis and the paper:

1. why does the paper call lt2 twice, recursively?

2. why does the paper compare the performance of their lt2 and lt3 instead of the performance of cmp2 and cmp3?

I intuit, when taking the double recursion out of lt2, which imnsho is erroneous, and when comparing cmp2 and cmp3, we'll see a performance penalty of a factor 2, between cmp2 and cmp3, and identical run times for lt2 and lt3, as it should be.

What am I missing?

edit: updated for clarity

2 comments

Initially I thought this preprint is a click bait (sorry, author), but when I read into details, I realize it is an interesting one. The key observation is the code under section 2.4.2. There, the author triggers the worse case (everything being equal in two trees) and shows that C++'s lt2 behavior leads to its horrible performance: 4.1s in C++ vs 0.018s in Python. Note that the difference is much more than two folds as lt2 and lt3 have different time complexity. PS: after a quick thought, I am not sure if we can avoid the quadratic behavior of lt2 in this particular example.
Haha, thank you! It was pretty demotivating to see so many people immediately dismiss it as clickbait without any attempt to discuss the topic at all, so it actually means a lot to hear that you think differently now. I hope it was fun & worth the read. I know I had a lot of fun writing it. :)
I think the sibling comment may have already answered your question, but if not, I think an earlier response I had might help clarify what exactly I'm comparing & why: https://news.ycombinator.com/item?id=26340952

Note that you do need to read the entire paper to see the overall picture and its consequences; if you stop halfway then it might appear like I'm comparing things unfairly. Feel free to let me know if anything is still confusing after reading the other comments and taking another look at the paper and I'll try to clarify!

let me rephrase, as I indeed have read the paper, before posting.

1. the lt2 definition in the paper is wrong. A lexicographical compare is linear in the size. the derived cmp2 is correct and has a run time twice that of cmp3. which matches the stl definitions of lexicographical_compare, see below.

2. the c++ behaviour in 2.4.2 is puzzling and most likely bug, worth reporting to and discussing with STL implementors.

https://www.cplusplus.com/reference/vector/vector/operators/

http://www.cplusplus.com/reference/algorithm/lexicographical...

> 1. the lt2 definition in the paper is wrong.

Would you mind providing a counterexample to illustrate what incorrect output it's producing?

> A lexicographical compare is linear in the size.

Indeed, lt2() also has a loop that iterates a linear number of times as you mention. It is consistent with this.

> the derived cmp2 is correct and has a run time twice that of cmp3. which matches the stl definitions of lexicographical_compare, see below.

Perhaps you might be confused about what lexicographical_compare does? It does not "compare" in the 3-way sense. It only performs a "less-than" comparison. The name is rather misleading.

2. the c++ behaviour in 2.4.2 is puzzling and most likely bug, worth reporting to and discussing with STL implementors.

I'm not sure what to report to anyone, as I don't find it puzzling; it is entirely consistent with everything I'm aware of and have tried to explain in the paper. It is also not specific to any particular implementation; I believe you will see this behavior on any correct implementation you try it on. It might be helpful if you try to produce a counterexample using what you believe would be a correct & efficient implementation to validate or invalidate this.