Hacker News new | ask | show | jobs
by Taek 1214 days ago
For reference, GPT-NeoX is a 20B parameter model, and it runs on 45 GB of VRAM. On an 80 GB A100 you could probably run a 35B parameter model. Maybe 8 A100 cards to do inference on ChatGPT?

Or 32 3090 cards, which would run you under $40k total.

1 comments

20B GPT-NeoX runs on a 3090 in 8 bit mode