Hacker News new | ask | show | jobs
by grepLeigh 1472 days ago
Would you use a chrome extension that stored the auto-complete model locally, and periodically sent anonymous statistical updates? Your keystrokes never leave your machine (and the model would work offline).

Federated machine learning is an area of my research.

2 comments

That's really interesting. Let me know if you'd want to talk to me or Michael sometime about it. With the models we run currently, you'd really have to have a GPU to run locally and get a lot of utility. I'm curious if you have some thoughts on how to run these large language models on edge devices.

I'm wilson@ our website (trying to avoid too much spam from bots).

I sent y'all an email, but figured I'd re-post here for any curious hackers. I spent two years obsessed with autocomplete for mobile/edge use cases.

The first step is to get any functional offline model (1), then prune/project a large language model's representation until you can perform on-device inference (2). You can calculate variance, hit / miss statistics for a body of text and model proposals (3), which you can feed into a ranking model (4) for an extra layer of personalization or use to re-balance the Euclidean projection of your model's layers (4) to optimize for sparseness.

1) Locally store a Trie data structure, where keys are n-grams of user input

Surprisingly effective, considering most business communication uses a limited vocabulary. If your users are submitting less than 10,000 unique English words (skip words removed) per day, try this out.

One thing I really liked about the Trie approach is that corporate jargon appears in real-time, since the "model" is just a data retrieval algorithm. You don't need to modify a vocabulary and re-train/fine-tune a neural network to achieve personalization.

The downside is that you're limited to bi/tri-grams before performance degrades, although YMMV. Auto-completing bi/tri grams does feel tedious after a while.

2) Fine-tune and prune a large language model, then make it sparse

I noticed y'all offer some degree of personalization. Have you tried pruning or compressing your model after fine-tuning? The exact technique will depend on your base model's architecture but in general, try using a sparser representation.

Use accelerators designed to operate on sparse representations, for example TensorFlow XNNPack's sparse operations. XNNPack is a backend engine that opens up native hardware acceleration options in WebAssembly, so you can accelerate inference using the client's GPU (if available).

3) Collect permutation variance and hit/miss statistics

The exact technique/algorithm will depend on your model architecture, but for example matched averaging is a way to express the average number of neighborhood permutations with respect to the input dataset. In other words, the client sends statistics about predictions in Euclidean space, not your literal keystrokes.

4) Use matched averaging to adjust model cardinality or train an additional ranking model

The statistics collected by step 3 can be used to train a personalized ranking model, with the goal of re-ranking the proposals from step 2.

You can also use these statistics to introspect the "embedding space" of a language model, with the goal of identifying compression/pruning opportunities to improve the model's real-time performance. Reducing cardinality in the embedding projection has an outsize impact on inference speed, and you can usually drop most of the language model after observing the range of language used by the client.

You can also used matched averaging to compare hidden <-> hidden weights between many with Euclidean distance measurements (like cosine distance).

This is WAY more than I originally intended to write - but I hope this helps!

Thanks for the thoughts here. Will follow-up over email!
It sounds much more privacy respecting. I guess it depends on what’s included in those statistical updates and how much trust I can give the extension. For example for not getting hacked or bought by hackers (see the great suspender as an example).

Edit: I don’t even use Chrome :) but that’s not the main point, but perhaps shows where I am personally on the privacy scale?

For the sake of discussion, what about an open source decentralized blockchain that shows that your data is always encrypted and only readable by hosted apps that are open source that you approve to have access to your data?
I’m not sure if you’re joking or not. I tend to believe that simple designs improve security, so this sounds like the polar opposite.

But I guess I’m not the main target audience. Given the succees of Grammarly and other privacy invasive apps, people are happy to give away their data for convenience. The HN crowd might be a bit different however. Winning this crowd might or might not be an indication of success.