|
|
|
|
|
by FartyMcFarter
91 days ago
|
|
A quick Google search reveals terms such as "sparse attention" that are used to avoid quadratic runtime. I don't know if Anthropic has revealed such details since AI research is getting more and more secretive, but the architectural tricks definitely exist. |
|