|
I'm failing to understand some things, maybe because I glanced through the paper with morning coffee. Apology if my tone ends up a little bit harsh, but these are just constructive criticism :) The first one is lt2 and lt3 implementations. You are implementing cmp2 through lt2, but lt3 through cmp3 (is this omission?). Both of them are stack-sensitive. Without being too harsh, I'm getting the impression that the intention was to write the most horrible comparison possible, which is different than worst-case time complexity. In the paper, lt2 (actually cmp2 in the paper) will always be at least two passes, and lt3 is at least one pass. I would not say they are two/single pass algorithms because complexity increases depending on list depth when lists are involved. Maybe I'm wrong, but both Python and C++ comparison operators are designed to be general-purpose comparison functions (and I'm more sure about C++, because this was touted through hundreds of books). As such, they should be good enough for most average cases. If you want speed, you go with balanced trees or something funkier. Also, for C++ Tree implementation, you are again using probably the worst approach - appending to vector recursively. Use list for this. Python list implements all sorts of tricks, comparing to C++ vector. And the last thing, but not the least: C++ containers depend on implementation (gcc libstdc++, stlport, msvc whatever), and I've seen substantial speed differences in standard operations. Hell, my old (almost conforming) list implementation was much faster than libstdc++ implementation because it wasn't trying to be too clever with slices and other magic. I'm sad you haven't used a more scientific approach with much more rigor here: what C++ compiler was used, what version, what assembly output was produced, on what processor, after how many runs, etc... Claiming "Python is faster than C++" sounds like a clickbait title. |
The context to keep in mind when reading the paper is: When designing a programming language & its standard library (or any API), we need to define an interface we can use as a building block, and we're analyzing the consequences of our choice of building blocks. In particular, we first examine the case of comparison-based data structures, which requires defining ordering primitives. In C++, the primitive is the < operator. In Python 2, it's cmp(). (In Python 3, it's a mix of < and ==, whose implications I discuss as well.) We assume user-provided types implement that basic interface, and we implement everything else we need on top of that.
So the question I'm analyzing in that example is: What happens if my primitive comparison operation is a 2-way comparison (lt(), like in C++) and then I implement 3-way comparison in terms of that (such as when I need it for for a binary search tree)? Now, what if we do the opposite: what happens if instead my primitive comparison operation is a 3-way comparison (cmp(), like in Python 2) and I only need to perform a 2-way comparison later? What are the trade-offs?
To do this, I take both approaches, implementing each in terms of the other, and compare how they behave complexity-wise. The conclusion I come to is that the choice of the primitive (which is often made by the language designers) isn't merely a question of aesthetics, but rather, it actually affects the time complexity of common operations (like traversing search trees). Similarly, the decision to cache a hash code doesn't just result in a constant-factor improvement, but it can actually change the time complexity of a program. And so on.
I think if you re-read the paper with these in mind, it should hopefully make more sense. The rest of what you said doesn't enter the picture at all... these are already balanced binary trees, the decision to use less<> is fundamentally independent of what C++ stdlib implementation you use, and the time of the vector concatenation isn't even being measured. Those things are unrelated to the point of the paper entirely. I was just trying to minimize the extraneous lines of code so we can focus on the heart of the problem instead of getting distracted by boilerplate.