Hacker News new | ask | show | jobs
by mewim 462 days ago
I think WebGPU is mostly for running inside the browser. If one has the option to use a cloud container + GPU, running LLM inference directly with CUDA/ROCm/TPU will be possible and runs more efficiently.