Hacker News new | ask | show | jobs
by bhelkey 512 days ago
Have you tried Ollama [1]? You should be able to run a 8b model in RAM and a 1b model in VRAM.

[1] https://news.ycombinator.com/item?id=42069453