Hacker News new | ask | show | jobs
by p1esk 606 days ago
4x seq length expansion doesn’t sound that bad.
1 comments

I mean, it's not completely fatal, but it means an approximately 16x increase in runtime cost, if I'm not mistaken. That's probably not worth trying to solve letter counting in most applications.
it is not necessarily 16x if you, e.g., decrease model width by a factor of 4 or so also, but yeah naively the RAM and FLOPs scale up by n^2