Hacker News new | ask | show | jobs
by casercaramel144 842 days ago
It's camel.

How do you do matrix vector attention without keeping the full matrix in cache, surely you don't just load unload it a million times