|
|
|
|
|
by alyxya
190 days ago
|
|
The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat. |
|
You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?