Hacker News new | ask | show | jobs
by SJC_Hacker 445 days ago
> They used a giant bunch of [data], a year and a half of GPU time to [train] the final model,

>[train]: "The training runs on 64 A100 GPUs over nine days", that would be around $18k on lambda labs in case you're wondering

How is that a "year and half of GPU time". Maybe on some exoplanet ?

1 comments

> > [train]: "The training runs on 64 A100 GPUs over nine days",

> How is that a "year and half of GPU time".

64 GPUs × 9 days = 576 GPU-days ≈ 1.577 GPU-years

Doh, that's entirely fair: haven't been in this thread yet, but would echo what I perceive as implicit puzzlement re: this amount of GPU time being described as bitter-lesson-y.