Hacker News new | ask | show | jobs
by alex7o 58 days ago
Turboquant on 4bit helps a lot as well for keeping context in vram, but int4 is definitely not lossless. But it all depends for some people this is sufficient