|
|
|
|
|
by tippytippytango
232 days ago
|
|
It's difficult to do because of how well matched they are to the hardware we have. They were partially designed to solve the mismatch between RNNs and GPUs, and they are way too good at it. If you come up with something truly new, it's quite likely you have to influence hardware makers to help scale your idea. That makes any new idea fundamentally coupled to hardware, and that's the lesson we should be taking from this. Work on the idea as a simultaneous synthesis of hardware and software. But, it also means that fundamental change is measured in decade scales. I get the impulse to do something new, to be radically different and stand out, especially when everyone is obsessing over it, but we are going to be stuck with transformers for a while. |
|
There’s a reason so much engineering effort has gone into speculative execution, pipelining, multicore design etc - parallelism is universally good. Even when “computers” were human calculators, work was divided into independent chunks that could be done simultaneously. The efficiency comes from the math itself, not from the hardware it happens to run on.
RNNs are not parallelizable by nature. Each step depends on the output of the previous one. Transformers removed that sequential bottleneck.