Hacker News new | ask | show | jobs
by fheinsen 252 days ago
To the best of our knowledge, this is the first time anyone has successfully trained a non-diagonal RNN computed in parallel, via prefix scan, without requiring any form of stabilization. We abstained from claiming as much out of an abundance of caution.
1 comments

Hmm... you may be right. I don't think I've seen that before either.