|
|
|
|
|
by tysam_and
1000 days ago
|
|
It's an RNN, there is no N^2 component over time. It only requires the previous state. (there's a discord, you should join it with further questions! I unfortunately am not as informed as I should be on this one, other than the fact that it is _very_ mobile friendly). The performance diff is slight but not too bad really, all things considered. And I think it comes out on top for raw efficiency per parameter/flop, IIRC. An interesting concept, for sure! :'DDDD :')))) |
|
If this technique is good, I'll wait until I can learn about it without joining the Discord.
[0]: https://news.ycombinator.com/item?id=35508692