|
|
|
|
|
by recursivecaveat
2361 days ago
|
|
As someone who works for another startup in this area, building the chip is only half the battle. The other half is tooling for compiling benchmark networks onto the chip in a performant manner. With 400k cores and their 'duplicate and re-route' defect strategy, this might literally be the most challenging compilation target ever made. It probably stacks up absolutely terribly in every metric right now. That's not to say it will necessarily get better, most of the people I've talked to don't think the megachip will ultimately amount to much more than a clever marketing ploy. |
|
* No DRAM or caches, everything is in SRAM, and all local SRAM loads are 1 cycle.
* Model parallel alone is full performance, no need for data parallel if you size to fit.
* Defects are handled in hardware; any latency differences are hidden & not in load path anyway.
* Fully asynchronous/dataflow by default, only need minimal synchronization between forward/backward passes.
I genuinely don't know how you'd build a simpler system than this.