Hacker News new | ask | show | jobs
by neoteo 2544 days ago
I think Apple's current approach, where all the smarts (Machine Learning, Differential Privacy, Secure Enclave, etc.) reside on your device, not in the cloud, is the most promising. As imagined in so much sci-fi (eg. the Hosaka in Neuromancer) you build a relationship with your device which gets to know you, your habits and, most importantly in regard to search, what you mean when you search for something and what results are most likely to be relevant to you. An on-device search agent could potentially be the best solution because this very personal and, crucially, private device will know much more about you than you are (or should be) willing to forfeit to the cloud providers whose business is, ultimately, to make money off your data.
4 comments

>, where all the smarts [...] reside on your device, not in the cloud, is the most promising. [...] An on-device search agent could potentially be the best solution [...]

Maybe I misunderstand your proposal but to me, this is not technically possible. We can think of a modern search engine as a process that reduces a raw dataset of exabytes[0] into a comprehensible result of ~5000 bytes (i.e. ~5k being the 1st page of search result rendered as HTML.)

Yes, one can take a version of the movies & tv data on IMDB.com and put it on the phone (e.g. like copying the old Microsoft Cinemania CDs to the smartphone storage and having a locally installed app search it) but that's not possible for a generalized dataset representing the gigantic internet.

If you don't intend for the exabytes of the search index to be stored on your smartphone, what exactly is the "on-device search agent" doing? How is it iterating through the vast dataset over a slow cellular connection?

[0] https://www.google.com/search?q="trillion"+web+pages+exabyte...

The smarts living on-device is not necessarily the same as the smarts executing on-device.

We already have the means to execute arbitrary code (JS) or specific database queries (SQL) on remote hosts. It's not inconceivable, to me, that my device "knowing me" could consist of building up a local database of the types of things that I want to see, and when I ask it to do a new search, it can assemble a small program which it sends to a distributed system (which hosts the actual index), runs a sophisticated and customized query program there, securely and anonymously (I hope), and then sends back the results.

Google's index isn't architected to be used that way, but I would love it if someone did build such a system.

To some extent, doesn't Google already do this? Meaning that based on your location/Google account/other factors such as cookies or search history, it will tailor your results. For instance, searching the same query on different computers will result in different results.

Though to your point, google probably ends up storing this information in the cloud

Also instant search results, which were common search terms that were cached at lower levels of the internet.
I think you're suggesting homomorphic encryption to execute the user's ranking model. Unfortunately, homomorphic encryption is pretty slow, and the types of operations you can do are limited. But it's viable if the data you're operating on is relatively small - e.g. just searching through (encrypted) personal messages or something.
I think you've got the right general idea, but I don't know that it has to be homomorphic encryption. After all, an index of the public web is not really secret, and the user doesn't have a private key for it.

In the simplest case, you could make a search engine in the form of a big, public, regularly-updated database, and let users send in arbitrary queries (run in a sandbox/quota environment).

That's essentially what we've got now, except the query parser is a proprietary black box that changes all the time. I don't see any inherent reason they couldn't expose a lower-level interface, and let browsers build queries. Why can't web browsers be responsible for converting a user's text (or voice) into a search engine query structure?

Or even an online search engine that was configurable where you could customize the search engine and assign custom weights to different aspects.

I'd love to be able to configure rules like:

+2 weight for clean HTML sites with minimal Javascript

+5 weight for .edu sites

-10 weight for documents longer than 2 pages

-5 weight for wordy documents

I'd also like to increase the weight for hits on a list of known high quality sites. Either a list I maintain myself, or one from an independent 3rd party.

Once upon a time I tried to use Google's custom search engine builder with only hand curated high quality sites as my main search engine. It was to much trouble to be practical, but I think that could change with an actual tool.

I think this is not what was the original question. A device that knows You still needs indexing service to find data for You. IMHO.
I remember hearing something about Differential Privacy from a WWDC keynote a few years back however I haven't heard much lately. Can you say how and where Apple is currently using Differential Privacy/
https://www.apple.com/privacy/docs/Differential_Privacy_Over...

Apple uses local differential privacy to help protect the privacy of user activity in a given time period, while still gaining insight that improves the intelligence and usability of such features as: • QuickType suggestions • Emoji suggestions • Lookup Hints • Safari Energy Draining Domains • Safari Autoplay Intent Detection (macOS High Sierra) • Safari Crashing Domains (iOS 11) • Health Type Usage (iOS 10.2)

Found via Google...