Hacker News new | ask | show | jobs
by p1esk 847 days ago
Models can be sensitive to the location of a needle in the haystack of its input block.

But only if we use some sort of attention optimization. For the quadratic attention algo it shouldn’t matter where the needle is, right?