Hacker News new | ask | show | jobs
by kreelman 39 days ago
I thought that might be the case. I naively wondered. I'll see if I can understand the paper :-)

Hope the paper gets lots of references and the technique gets a lot of use to save power and time.

There's been several potential big changes for LLM inference efficiency over the last few months. There's been Attention Sequencing (I think it's called..?) Turbo Quant and this one.

Interesting times.