|
|
|
|
|
by rob_c
221 days ago
|
|
tbf, transformers from more of a developmental perspective are hugely wasteful. they're long-range stable sure, but the whole training process requires so much power/data compared to even slightly simpler model designs I can see why people are drawn to alternative complex model designs down-playing the reliance on pure attention. |
|