Hacker News new | ask | show | jobs
by nwlieb 1162 days ago
The runtime is quadratic for a given context size, although it seems like there is some progress on this front https://gwern.net/note/attention