Hacker News new | ask | show | jobs
by jasode 2547 days ago
>, where all the smarts [...] reside on your device, not in the cloud, is the most promising. [...] An on-device search agent could potentially be the best solution [...]

Maybe I misunderstand your proposal but to me, this is not technically possible. We can think of a modern search engine as a process that reduces a raw dataset of exabytes[0] into a comprehensible result of ~5000 bytes (i.e. ~5k being the 1st page of search result rendered as HTML.)

Yes, one can take a version of the movies & tv data on IMDB.com and put it on the phone (e.g. like copying the old Microsoft Cinemania CDs to the smartphone storage and having a locally installed app search it) but that's not possible for a generalized dataset representing the gigantic internet.

If you don't intend for the exabytes of the search index to be stored on your smartphone, what exactly is the "on-device search agent" doing? How is it iterating through the vast dataset over a slow cellular connection?

[0] https://www.google.com/search?q="trillion"+web+pages+exabyte...

1 comments

The smarts living on-device is not necessarily the same as the smarts executing on-device.

We already have the means to execute arbitrary code (JS) or specific database queries (SQL) on remote hosts. It's not inconceivable, to me, that my device "knowing me" could consist of building up a local database of the types of things that I want to see, and when I ask it to do a new search, it can assemble a small program which it sends to a distributed system (which hosts the actual index), runs a sophisticated and customized query program there, securely and anonymously (I hope), and then sends back the results.

Google's index isn't architected to be used that way, but I would love it if someone did build such a system.

To some extent, doesn't Google already do this? Meaning that based on your location/Google account/other factors such as cookies or search history, it will tailor your results. For instance, searching the same query on different computers will result in different results.

Though to your point, google probably ends up storing this information in the cloud

Also instant search results, which were common search terms that were cached at lower levels of the internet.
I think you're suggesting homomorphic encryption to execute the user's ranking model. Unfortunately, homomorphic encryption is pretty slow, and the types of operations you can do are limited. But it's viable if the data you're operating on is relatively small - e.g. just searching through (encrypted) personal messages or something.
I think you've got the right general idea, but I don't know that it has to be homomorphic encryption. After all, an index of the public web is not really secret, and the user doesn't have a private key for it.

In the simplest case, you could make a search engine in the form of a big, public, regularly-updated database, and let users send in arbitrary queries (run in a sandbox/quota environment).

That's essentially what we've got now, except the query parser is a proprietary black box that changes all the time. I don't see any inherent reason they couldn't expose a lower-level interface, and let browsers build queries. Why can't web browsers be responsible for converting a user's text (or voice) into a search engine query structure?