Hacker News new | ask | show | jobs
by logicchains 809 days ago
>Lately I've been wondering... is this a problem, or a strength?

It's a strength; fundamentally it's impossible to achieve the same degree of accuracy with a sub-quadratic attention mechanism: https://arxiv.org/abs/2209.04881 (unless the Strong Exponential Time Hypothesis is false, which is very unlikely, like P=NP).