| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mewim 462 days ago
	I think WebGPU is mostly for running inside the browser. If one has the option to use a cloud container + GPU, running LLM inference directly with CUDA/ROCm/TPU will be possible and runs more efficiently.