Hacker News new | ask | show | jobs
by jandrewrogers 36 days ago
To give a concrete example, spatial indexing algorithms badly break cache replacement algorithms for intrinsic theory reasons. This has been known since at least the 1980s.

We have a literature full of spatial indexing algorithms that can't work in any real system because they assume cache replacement algorithms. This problem isn't even mentioned in modern academic papers. That is extremely low-value research. That's like doing physics research under the assumption that the fundamental laws of physics don't apply. It might be an intellectually interesting exercise but it isn't useful.

It isn't all like this. The spatial indexing literature is actively bad to an unusual extent. If you look at e.g. academic graph analytic algorithm research, where I also worked, it is mostly just decades behind the non-academic state-of-the-art. The literature won't mislead you but it also won't tell you where the frontier is.

1 comments

I'm not really trying to white knight for academia. I know there is a lot of low-quality stuff (part of that is just reality, most of the work done in academia or industry won't stand the test of time). But I do kind of have to assume that if this field of work is continuing there is a reason and they haven't just been spinning wheels for 40 years because they missed something simple and obvious.

This article[0] says "Data changes are usually much less frequent than queries, so incurring an initial cost of processing data into an index is a fair price to pay for instant searches afterwards."

Is that what you're referring to with cache replacement? There's no way to quickly update the index?

Wikipedia says "The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts."

I'm gathering that there are applications which are ok with that limitation? Generally there is communication and awareness between industry and academia, if everyone really needs cache replacement for it to be useful, why have they not attempted to account for it? What is the cause of the total disconnect you describe?

[0] https://blog.mapbox.com/a-dive-into-spatial-search-algorithm...