Hacker News new | ask | show | jobs
by jasonni 717 days ago
In their announcing page, the section "How can we fit so much more FLOPS on our chip than GPUs?" tells some details. It's said "only 3.3% of the transistors on an H100 GPU are used for matrix multiplication". They trade off programmbility with computation density. And from the "Isn’t inference bottlenecked on memory bandwidth, not compute?" section, I guess they use similar tricks like Groq. Looking forward to more architecture details and comparation with Groq.