Y
Hacker News
new
|
ask
|
show
|
jobs
by
mewim
462 days ago
I think WebGPU is mostly for running inside the browser. If one has the option to use a cloud container + GPU, running LLM inference directly with CUDA/ROCm/TPU will be possible and runs more efficiently.