|
|
|
|
|
by mrinterweb
78 days ago
|
|
If turboquant can reliably reduce LLM inference RAM requirements by 6x, suddenly reducing total RAM needs by 6x should have a dramatic shift on the hardware market, or at least we can all hope. I know 6x is the key-value cache saving, so I'm not sure if that really translates to 6x total RAM requirements decrease for inference. |
|