|
|
|
|
|
by saulpw
58 days ago
|
|
Presumably $100m to train the 70B model? I think you're assuming that the author meant you can take an existing 70B model and run it in 16GB. But it stands to reason that "no loss in capability" means it had to be trained under those constraints. |
|