Hacker News new | ask | show | jobs
by cavisne 436 days ago
Was that gamble wrong? I thought all LLM training workloads do collectives that involve all nodes (all-gather, reduce-scatter).
1 comments

I think the choice they made, combined with some great software and hardware engineering, allows them to continue to innovate at the highest level of ML research regardless of their specific choice within a reasonable dollar and complexity budget.