|
|
|
|
|
by x-complexity
357 days ago
|
|
The article assumes that there will be no architectural improvements / migrations in the future, & that Sparse MoE will always stay. Not a great foundation to build upon. Personally, I'm rooting for RWKV / Mamba2 to pull through, somehow. There's been some work done to increase their reasoning depths, but transformers still beat them without much effort. https://x.com/ZeyuanAllenZhu/status/1918684269251371164 |
|
In terms of microbiology, the architecture of Transformer is more in line with the highly interconnected global receptive field of neurons
https://github.com/dmf-archive/PILF