Hacker News new | ask | show | jobs
by aurareturn 578 days ago
Each Cerebras wafer scale chip has 44GB of SRAM. You need 972 GB of memory to run Llama 405b at fp16. So you need 22 of these.

I assume they're using SRAM only to achieve this speed and not HBM.