Hacker News new | ask | show | jobs
by golol 757 days ago
As I understand, conceptually they just changed 346 + 23 = ? to (1: 3, 2: 4, 3: 6) + (1: 2, 2: 3) = ? So it is not that much of a specific hack. There could be a broader principle here where something is holding transformers back in a general fashion, and we might be able to improve on the architecture!
1 comments

Hopefully 3:3, 2:4, 1:6 and 2:2, 1:3?