| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iliane5 1024 days ago
	I think it's mostly the scale. Once you have a consistent user base and tons of GPUs, batching inference/training across your cluster allows you to process requests much faster and for a lower marginal cost.