Hacker News new | ask | show | jobs
by halflings 424 days ago
That's what the chart says yes. 14.1GB VRAM usage for the 27B model.
1 comments

That's the VRAM required just to load the model weights.

To actually use a model, you need a context window. Realistically, you'll want a 20GB GPU or larger, depending on how many tokens you need.

I didn't realize that the context would require such so much memory. Is this KV caches? It would seem like a big advantage if this memory requirement could be reduced.