|
|
|
|
|
by whynotmaybe
558 days ago
|
|
One Nvidia A100. From the paper : > We train using the AdamW [26] optimizer with a batch size of 5 and gradient accumulation over 20 steps on a single NVIDIA A100 GPU So it's "consumer-grade" because it's available to anyone, not just businesses. |
|