| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gajjanag 502 days ago
	This is much more nuanced now. See Apple "Private Cloud Compute": https://security.apple.com/blog/private-cloud-compute/ ; they run a lot of the larger models on their own servers. Fundamentally it is more efficient to process a batch of tokens from multiple users/requests than processing them from a single user's request on device.