|
Did they? Deepseek spent about 17 months achieving SOTA results with a significantly smaller budget. While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute. If you had $3 billion,
xAI would choose to invest $2.5 billion in GPUs and $0.5 billion in talent.
Deepseek, would invest $1 billion in GPUs and $2 billion in talent. I would argue that the latter approach (Deepseek's) is more scalable. It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible. |
In any AI R&D operation the bulk of the compute goes on doing experiments, not on the final training run for whatever models they choose to make available.