Hacker News new | ask | show | jobs
by itissid 462 days ago
Since I already have a browser connected to the Internet where this would execute, could one have the option of transparently executing the webGPU + LLM in a cloud container communicating with the browser process?
1 comments

I think WebGPU is mostly for running inside the browser. If one has the option to use a cloud container + GPU, running LLM inference directly with CUDA/ROCm/TPU will be possible and runs more efficiently.