Hacker News new | ask | show | jobs
by yencabulator 502 days ago
And now you need a server per model? Ollama loads models on-demand, and terminates them after idle, all accessible over the same HTTP API.