|
|
|
|
|
by Chirono
617 days ago
|
|
The two other changes they mention have been widely adopted, and are included in at least some of the models they benchmark against. It seems they list them for completeness as changes to the original transformer architecture. |
|