Hacker News new | ask | show | jobs
by ttul 1033 days ago
I’m wildly speculating, but it seems that projects like llama.cpp are bringing SOTA models closer to the desktop. It’s only a matter of time before browsers embed a small LLM for various purposes, providing access to this local model via a JavaScript API that allows clients like Gmail to perform tasks on data locally. Apple would be strongly incentivized to do this, given their value proposition of user privacy.
1 comments

I think the fact Google curiously ignored all the security problems raised by the WebGPU API suggests they are closer to trying to offload the GPU inference part of this to end users than people think.

Build as much of the model as you can in the cloud, run inference locally and push results back is probably the cost optimal way to run this stuff at scale.