This strikes me as something that many people probably figured out a non-rigorous version of and didn't think it was special.
It's kind of one of those resource management hacks you do when you're constrained and screwed by limitations. Splitting things up by priority is a common go-to for resource allocation. This is a spin on that.
I wonder how many other "in the trenches hacks" people have done that overturn widely accepted things the inventors didn't realize were a big deal: "well I usually have a bunch of deliveries to make and I've figured out a clever way to map out the quickest path... "
Don't get me wrong - recognizing it and then formalizing it, doing the work, publishing the paper - that's a lot of effort. I'm not taking that away.
Also relevant: in this particular case the authors themselves note that the results better theoretical behavior in the worst case, but no practical uses yet. So I think any software engineer exploring this direction would have abandoned it pretty quickly, for the same reason that galactic algorithms aren't typically invented by them either (unless they also do compsci as a hobby of course). In fact the Wiki page for galactic algorithm mentions another optimal-but-impractical hash table as one of its examples[0][1].
> I wonder how many other "in the trenches hacks" people have done that overturn widely accepted things the inventors didn't realize were a big deal: "well I usually have a bunch of deliveries to make and I've figured out a clever way to map out the quickest path... "
A lot of them. Having said that: yes, I can imagine that others would have thought up Dijkstra's shortest path algorithm, since he himself said it came to him while shopping, and that it only took him twenty minutes to reason through the original O(n²) algorithm. (edit: oh wait, that's what you're alluding to isn't it? Heh, that went straight over my head).
On the other hand, I don't think the faster versions of Dijkstra's algorithm would have been invented by anyone without at least some understanding of priority queues and big-O behavior. And at that point I hope people realize that they possess some specialized knowledge that might not be entirely common.
In fact, I'd argue that the true strength of Dijkstra's write-up is that it gives us a vocabulary to reason about it and come up with specialized data structures for particular situations.
Anyway, what you're touching on is the difference between engineering and science: engineering works with confidence built from tests, rules of thumb that reflect lessons learned from historical results, and (in modern times) verified predictions from science. Those rules of thumb might be used when lacking a deeper scientific understanding of why it works. The tests might exist to work around the limitations of scientific knowledge (e.g. modelling turbulence). Science creates insights and predictions through modelling of empirical results. At least that's the difference according to Bill Hammack[0].
In an ideal world the two professions work together and build on each other's results to propel each other forward of course.
Eh, the trading salesmal problem is more like the Collatz conjecture: it looks simple but there's a lot of complexity hiding under the surface, and it requires some expertise to truly understand why it's really hard. So then we're talking about the opposite problem.
Note that your informal description did not match the TSP since there's no reason to disallow backtracking or visiting the same place twice.
Thanks so much for this link. I remain convinced that papers are so much more understandable with an accompanying talk by the creators. I wish papers would just come with a video talk included.
Exactly, the authors get to eschew the formalism required in papers. Often the core ideas of research are simple in themselves and the real complexity lies in formally proving the results.
Also, I'd not be surprised if someone already invented and used this funnel hashing technique in say the 80's in some game or whatnot but just never realized what they had stumbled onto. Not to diminish the research, it's very ingenius.
I think papers make good references. I think of it more like the equivalent of a "datasheet" for an electronic part, say. Once you understand the intricacies, it's a valuable reference but more often than not, it's not very good and conveying motivation or intuition.
Thanks for the video, def a lot better than the article.
I do find it a bit weird that this is somehow better than just over-allocating (and thus reducing the chances of key collisions, which also makes worst case 'less worse') given his approach also allocates more memory through the aux arrays.
I don't think anybody is really saying it is. Academics treat big-Oh performance on very very full hash tables like a sport. Real world code on real world CPUs often has a more complex cost function than what the academics considered; cache sizes, fitting in a cacheline, memory bandwidth, icache pressure, ...
He's not allocating through aux arrays, he's splitting the already allocated memory into log(n) layers. You can just track those aux arrays with math in the implementation.
It's probably not better than over-allocating except in memory constrained scenarios. But the overhead of funnel hashing is not high - it requires 0 extra memory
Overallcoation has a limit. You only have so much RAM/storage. Beyond that you start swapping.
I could really use a hash table (or similar structure) that degrades less with higher occupancy.
It's kind of one of those resource management hacks you do when you're constrained and screwed by limitations. Splitting things up by priority is a common go-to for resource allocation. This is a spin on that.
I wonder how many other "in the trenches hacks" people have done that overturn widely accepted things the inventors didn't realize were a big deal: "well I usually have a bunch of deliveries to make and I've figured out a clever way to map out the quickest path... "
Don't get me wrong - recognizing it and then formalizing it, doing the work, publishing the paper - that's a lot of effort. I'm not taking that away.