Hacker News new | ask | show | jobs
by IshKebab 2361 days ago
When you consider the things that that diagram doesn't show, it doesn't look at all simple. Does that graph even have training? It'll have to be pipelined too. Probably will have to use recomputation due to the shortage of memory. What about within the boxes? You can't nicely separate a matmul into pieces like that.

I work on something similar but less ambitious, trust me it is crazy complicated.

1 comments

Could you be more explicit? What about the naïve approach to training (same graph but backwards, computing gradients) is going to fail?

Wrt. matmul, if you couldn't split them up, today's AI accelerators wouldn't work full stop. But regardless, even if it was much more complex on CS-1 than on all the other sea-of-multipliers accelerators, it's obviously a problem they've solved and so irrelevant to the compilation issue.