Hacker News new | ask | show | jobs
by omneity 317 days ago
It takes ~17-20GB on Q4 depending on context length & settings (running it as we speak)

~30GB in Q8 sure, but it's a minimal gain for double the VRAM usage.