Hacker News new | ask | show | jobs
by dnautics 111 days ago
it proves that the algorithm is embeddable in a bigger transformer of ~similar architecture.