Hacker News new | ask | show | jobs
by lennxa 613 days ago
they mention similar performance to vanilla transformer with significantly reduced param count though