Hacker News new | ask | show | jobs
by jandrewrogers 4475 days ago
What makes spatial indexing different than other areas of computer science is that its fundamental operands are interval types (like hypercubes). Almost all other areas of computer science focus on solutions that only work on point-like data and so many idioms and intuitions computer scientists use are not actually valid when dealing with spatial representations. Data types that require no less than two integers to describe break the assumptions of computer science developed for data types that require one integer to describe.

Oddly, there are almost no people working on the computer science of spatial representations and indexing. That was how I became involved in the first place; I needed solutions to specific problems for which no active research was being done in academia (and I was talking to people like Samet at the time). And to this day, academics still aren't doing any interesting work on spatial structures.

The algorithms are not classified, just not published. There are a couple patents out there on spatial sieves -- I am the inventor on the first practical one -- but more sophisticated and advanced variants exist that will probably never be patented. Widmayer (respected CS academic) was the first to propose this approach to the problem of generalized interval indexing but a useful algorithm was not discovered until my work in 2007.

The point indexing algorithm family I mentioned has never been patented or published anywhere but was based on the development of a novel theoretical CS construct that allows the expression of polymorphic space-filling curves. These are essentially adaptive to the information theoretic properties of the high-dimensionality data set but still embarrassingly parallel and distributable. I don't think these are being used in production anywhere (yet) but they've been around for several years. Very cool but the number of people that grok them can be counted on one hand.

All modern research on computing with spatial representations is being done at one of a few companies, which is why nothing is published. People like me, and I haven't done the pure research in years, don't have time to generate hundreds of pages of fairly deep content. Consequently, learning advanced spatial indexing is fairly prohibitive; you have to figure it out yourself, which is far from easy, or be at one of the handful of places where they are using it.

3 comments

I found some related papers for others who are interested:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56....

http://bmi.osu.edu/resources/techreports/ahmedBokhariV21.pdf

Also a book:

Space-Filling Curves: An Introduction with Applications in Scientific Computing

by

Michael Bader

Thanks for the insight. I've seen similar patterns in areas such as recommendation systems. There are active research communities but really all of the detail work and pushing the envelope happens in private companies.

The financial incentives drive the pace. In the future I'm sure it will become more commoditized but right now the practical nature of integrating with large shopping, coupon, and marketing systems and scaling for millions of users place the progress mostly in the domain of industry not academy.

Fascinating, thank you for taking the time to reply!