Hacker News new | ask | show | jobs
by zozbot234 5 hours ago
You don't need that much VRAM unless you're targeting a high-performance deployment that's intended to scale far beyond local use. For a lower-throughput case, you can keep the model weights on SSD at very low cost and stream them in for inference. This could actually scale reasonably well if you have something as simple as a previous-gen HEDT with a decent amount of PCIe lanes to host fast storage from.