Hacker News new | ask | show | jobs
by saulpw 58 days ago
Presumably $100m to train the 70B model? I think you're assuming that the author meant you can take an existing 70B model and run it in 16GB. But it stands to reason that "no loss in capability" means it had to be trained under those constraints.