Hacker News new | ask | show | jobs
by mrinterweb 78 days ago
If turboquant can reliably reduce LLM inference RAM requirements by 6x, suddenly reducing total RAM needs by 6x should have a dramatic shift on the hardware market, or at least we can all hope. I know 6x is the key-value cache saving, so I'm not sure if that really translates to 6x total RAM requirements decrease for inference.