| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by whynotmaybe 558 days ago

One Nvidia A100.

From the paper :

> We train using the AdamW [26] optimizer with a batch size of 5 and gradient accumulation over 20 steps on a single NVIDIA A100 GPU

So it's "consumer-grade" because it's available to anyone, not just businesses.

1 comments

That is the training gpu… the inference gpu can be much smaller.

I stand corrected.

Found on Yi-Zhe Song's Linkedin :

> Runs on a single NVIDIA 4090

Thanks!