|
|
|
|
|
by 5kg
852 days ago
|
|
From https://arxiv.org/pdf/2402.08268.pdf > We trained our models using TPUv4-1024, which is approximately equivalent to 450 A100s > Inference for such long sequences requires a minimum
of v4-128 So you'll need ~60 A100 for inference. |
|