|
|
|
|
|
by mlmonkey
40 days ago
|
|
"Attention" is just a matmul. Q = KV/sqrt(d) etc. I don't see how any planning is done in latent space. Can you point me to any papers? Thanks. Edit: Oh, I see you're probably talking about CoCoNuT? Do all frontier models us it nowadays? |
|