Hacker News new | ask | show | jobs
by nl 30 days ago
> if you want to train your 5T params model like modern small models are being trained (with a thousands time more training tokens than params), that's an enormous training run.

Yes it is. Spending $100M on training runs is common, and $1B might be in scope for some of the large models.

Sonnet 3.5 cost "a few 10s of millions of dollars" back in 2024: https://simonwillison.net/2025/Jan/29/on-deepseek-and-export...