Hacker News new | ask | show | jobs
by peaslock 1277 days ago
Though isn't it highly likely that core devs working at the big tech giants have access to 10x-100x faster compute, e.g. some secret TPU successor at Google?
1 comments

The magical number for performance is actually memory bandwidth which is actually lower for TPUs compared to A100s. They have more aggregate compute, but it's not trivial to use that to get very low latency on a per request basis.
But they have highly likely internal prototypes with higher bandwidth and latency. Also, with distilled latent diffusion one can probably generate text(-images) much faster anyhow as it could produce long chunks of text at once, rather than needing recurrently feed back the new token to the inputs.