|
|
|
|
|
by logicchains
809 days ago
|
|
>Lately I've been wondering... is this a problem, or a strength? It's a strength; fundamentally it's impossible to achieve the same degree of accuracy with a sub-quadratic attention mechanism: https://arxiv.org/abs/2209.04881 (unless the Strong Exponential Time Hypothesis is false, which is very unlikely, like P=NP). |
|