Y
Hacker News
new
|
ask
|
show
|
jobs
by
pierrefdz
586 days ago
The amounts of gpu time in the paper are for all experiments, not just training the last model that is OSS (which is usually reported). People don't just oneshot the final model.