Hacker News new | ask | show | jobs
by michaelt 813 days ago
There has in fact been a great deal of careful engineering to allow 70 billion parameter models to run on just 48GB of VRAM

The people training 70B parameter models from scratch need ~600GB of VRAM to do it!