Hacker News new | ask | show | jobs
by Chirono 617 days ago
The two other changes they mention have been widely adopted, and are included in at least some of the models they benchmark against. It seems they list them for completeness as changes to the original transformer architecture.
1 comments

Nicely spotted! Then, I really look forward to seeing this method tested by others! Epic stuff.