Hacker News new | ask | show | jobs
by winterismute 1611 days ago
It's a tough question, it's not even back-propagation but even just sometimes the "parameters" of the models, for example [1] shows that models such as ResNeXt already perform better on a very different architecture such as Graphcore, for some sizes of convolutions. Older models, or models that get tuned for existing GPUs, do not perform as well.

It's tough to come up with a new architecture that can have an advantage on current and future models, at least from a peak perf point of view, from a perf/watt for example instead the scaled-up Apple GPUs seem to show new interesting properties. But the Graphcore architecture is quite interesting, being able to act somehow both as a SIMD machine and a task-parallel machine.

[1] https://arxiv.org/pdf/1912.03413v1.pdf