Interesting that they’ve scaled on-chip memory sublinearly with the growth of transistors between their generations, I would’ve thought they would try to bump that number up. Maybe it’s not a major bottleneck for their training runs?
Cerebras runs at 1.1 GHz[1], and this was a much earlier design on 16nm so it might be a good fit by now. Their TSMC 5 nm version is scheduled for early 2025.[2]
I'd bet that making a chip the size of the waver has the benefit on not losing any silicon to dicing the wafer up like a desktop or GPU chips coming from a wafer. Major downside is you need to either have a massive x and Y exposure size or break the wafer into smaller exposures which means your still needing to focus on alignment between the steps, and if a defect can't be corrected then is that wafer just scrap?
Making larger monolithic silicon doesn't get 2x as expensive to get 2x as large. Bigger silicon is massively more expensive. I'm not sure that making each piece require a large chunk of perfect wafer is a fantastic idea, especially when you're looking to unseat juggernauts who have a great deal of experience making high quality product already.
It’s fascinating.