| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by peaslock 1277 days ago
	Though isn't it highly likely that core devs working at the big tech giants have access to 10x-100x faster compute, e.g. some secret TPU successor at Google?

1 comments

varunkmohan 1274 days ago

The magical number for performance is actually memory bandwidth which is actually lower for TPUs compared to A100s. They have more aggregate compute, but it's not trivial to use that to get very low latency on a per request basis.

link

peaslock 1273 days ago

But they have highly likely internal prototypes with higher bandwidth and latency. Also, with distilled latent diffusion one can probably generate text(-images) much faster anyhow as it could produce long chunks of text at once, rather than needing recurrently feed back the new token to the inputs.

link