Hacker News new | ask | show | jobs
by dragonwriter 310 days ago
You also need space in VRAM for what is required to support the context window; you might be able to do a model that is 14GB in parameters with a small (~8k maybe?) context window on a 16GB card.