|
|
|
|
|
by fheinsen
252 days ago
|
|
To the best of our knowledge, this is the first time anyone has successfully trained a non-diagonal RNN computed in parallel, via prefix scan, without requiring any form of stabilization. We abstained from claiming as much out of an abundance of caution. |
|