Y
Hacker News
new
|
ask
|
show
|
jobs
by
germanjoey
576 days ago
Pretty amazing speed, especially considering this is bf16. But how many racks is this using? The used 4 racks for 70B, so this, what, at least 24? A whole data center for one model?!
1 comments
aurareturn
575 days ago
Each Cerebras wafer scale chip has 44GB of SRAM. You need 972 GB of memory to run Llama 405b at fp16. So you need 22 of these.
I assume they're using SRAM only to achieve this speed and not HBM.
link
I assume they're using SRAM only to achieve this speed and not HBM.