Hacker News new | ask | show | jobs
by swifthesitation 548 days ago
I don't think so. It seems to just lower the ram needed for the context window. Not for loading the model on the vram.