Hacker News new | ask | show | jobs
by zaptrem 921 days ago
Iirc Ethereum ASICs were also memory bandwidth bound. With KV caching transformers are just lots and lots of matrix vector multiplication and are bound by loading the huge weight matrices onto the cores.