| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mehrdadn 1973 days ago

If you're wondering whether this is a theoretical or practical problem: I actually observed some of this effect in practice first, and only after thinking about it for a while did the larger issue (and the complexity implications) dawn on me.

I had something like a set<string> or a set<set<string>> (or map... I can't remember which) somewhere in my program a few years ago, and I was trying to improve the program's performance. I tried breaking into it several times and found it quite perplexing that the bottleneck appeared to be the set<> container. I mean, I get that cache locality and all has an effect, but it seemed to be having a little too much of an effect. Why did I find this odd? Because other languages (like C#) have tree-based sets and maps too, but I'd never felt they were quite as slow as I was seeing them in C++. So I felt something weird must be going on.

I tried to step through the program for a while and observe what's going on, and at some point, I realized (perhaps I hit the same breakpoint twice? I don't recall) that these functions were being called more often than I intuitively thought would be necessary. Which was obvious in hindsight, as less() needs to be called multiple times on the same objects (4 times at level 2). Now, this hadn't resulted in quadratic behavior, but that was only because my data structures weren't arbitrarily deep—the slowdown was nevertheless a significant constant factor at the core of my program, only linear because the data structure's depth was bounded.

So once I had realized the implications of this, including that constant-factor differences can actually turn into polynomial ones, I eventually decided to post an article about it on arXiv... hence this article. Aside from the complexity issue illustrated in the article, one of my (unexpected) higher-level takeaways was that you can't just take any utility function in a program and blindly put it into a library: even if it's correct, you probably have hidden assumptions about its use cases that won't hold true in general. You really need to think about how it could be used in ways you didn't initially expect, and this may well require a different kind of code review with an entirely different mindset than before. It really drove home the point for me that writing a library is quite a bit different (and in some ways more difficult) than writing an application.

It's possible there is more theory lying underneath this that I haven't touched on—it would be interesting if someone can take this and find more interesting consequences of it. For example, maybe when analyzing algorithms we need to consider something akin to a "recursive time complexity", too? Is there another useful notion of complexity that we can generalize here?

Anyway, hope people enjoy reading it and take something useful away from it. :)

3 comments

nivenkos 1973 days ago

By the way for the hashing example in count() - Rust has the Entry API for HashMaps to avoid exactly this problem: https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

The PartialOrd trait also uses 3-way comparisons so I think the other issue is mitigated too, but it'd be interesting to check: https://doc.rust-lang.org/std/cmp/trait.PartialOrd.html#tyme...