| Is anyone working on software that lets you run local LLMs in the browser? In theory, it should be possible, shouldn't it? The page could hold only the software in JavaScript that uses WebGL to run the neural net. And offer an "upload" button that the user can click to select a model from their file system. The button would not upload the model to a server - it would just let the JS code access it to convert it into WebGL and move it into the GPU. This way, one could download models from HuggingFace, store them locally and use them as needed. Nicely sandboxed and independent of the operating system. |
https://huggingface.co/spaces/webml-community/llama-3.2-webg... loads a 1.24GB Llama 3.2 q4f16 ONNX build
https://huggingface.co/spaces/webml-community/janus-pro-webg... loads a 2.24 GB DeepSeek Janus Pro model which is multi-modal for output - it can respond with generated images in addition to text.
https://huggingface.co/blog/embeddinggemma#transformersjs loads 400MB for an EmbeddingGemma demo (embeddings, not LLMs)
I've collected a few more of these demos here: https://simonwillison.net/tags/transformers-js/
You can also get this working with web-llm - https://github.com/mlc-ai/web-llm - here's my write-up of a demo that uses that: https://simonwillison.net/2024/Nov/29/structured-generation-...