Hacker News new | ask | show | jobs
by jwan584 917 days ago
Everyone knows Cerebras by their wafer scale chips. The less understood part is the 12TB of external memory. That's the real reason why large models fit by default and you don't have to chop it up in software ala megatron/deepspeed.
1 comments

imo the benefits to chopping it up will always remain