Hacker News new | ask | show | jobs
by refulgentis 507 days ago
> We’ve just learned that it’s possible to do AI on less compute (deepseek).

There's a huge motte and bailey thing with DeepSeek conversation, where the bailey is "It only took $5.5 million!*" (* for exactly one training run for one of several models, at dirt-cheap per-hour spot prices for H100s) and the motte is all sorts of stuff.

Truth is one run for one model took 2048 GPUs fulltime for 2 months, and my experience with FAANG ML, that means it took 6 months part-time and another 1.5-2.5 runs went absolutely nowhere.