Y
Hacker News
new
|
ask
|
show
|
jobs
by
cavisne
436 days ago
Was that gamble wrong? I thought all LLM training workloads do collectives that involve all nodes (all-gather, reduce-scatter).
1 comments
dekhn
436 days ago
I think the choice they made, combined with some great software and hardware engineering, allows them to continue to innovate at the highest level of ML research regardless of their specific choice within a reasonable dollar and complexity budget.
link