|
|
|
|
|
by optimalsolver
988 days ago
|
|
Rather than all this effort to work around the flaws of the transformer model, maybe researchers should be looking for a better architecture altogether. The absolutely insane amount of compute that transformers consume could probably be better used for neuroevolutionary search. |
|