|
|
|
|
|
by golol
757 days ago
|
|
As I understand, conceptually they just changed 346 + 23 = ? to (1: 3, 2: 4, 3: 6) + (1: 2, 2: 3) = ?
So it is not that much of a specific hack. There could be a broader principle here where something is holding transformers back in a general fashion, and we might be able to improve on the architecture! |
|