Hacker News new | ask | show | jobs
by pierrefdz 586 days ago
The amounts of gpu time in the paper are for all experiments, not just training the last model that is OSS (which is usually reported). People don't just oneshot the final model.