Hacker News new | ask | show | jobs
by 5kg 852 days ago
From https://arxiv.org/pdf/2402.08268.pdf

> We trained our models using TPUv4-1024, which is approximately equivalent to 450 A100s

> Inference for such long sequences requires a minimum of v4-128

So you'll need ~60 A100 for inference.