|
|
|
|
|
by jerf
1296 days ago
|
|
This should at least get you going on the topic, in that even if it's too much or too little it's a rich source of terms, and directly addresses the topic: https://www.gwern.net/notes/Attention I also want to make clear that while this is fundamental to this particular technology, I'm not saying it's fundamental to all possible AI architectures. But it is pretty ingrained into how transformers work. I don't think it can just "evolve" past it, I think anything that "evolved" past it would be a fundamentally different architecture. |
|