Hacker News new | ask | show | jobs
by rao-v 24 days ago
I'd have to try the KV cache trick but folks get pretty competitive speeds with the current 31B/27B dense models e.g. https://www.reddit.com/r/LocalLLaMA/comments/1tc9j6u/mi50s_q...