|
|
|
|
|
by FartyMcFarter
94 days ago
|
|
Isn't transformer attention quadratic in complexity in terms of context size? In order to achieve 1M token context I think these models have to be employing a lot of shortcuts. I'm not an expert but maybe this explains context rot. |
|