Hacker News new | ask | show | jobs
by atgctg 1068 days ago
As an example, INT8 support in WebGPU would enable running quantized models, allowing larger LLMs to run locally in the browser.

See Limitations section here: https://fleetwood.dev/posts/running-llms-in-the-browser