Y
Hacker News
new
|
ask
|
show
|
jobs
by
yencabulator
502 days ago
And now you need a server per model? Ollama loads models on-demand, and terminates them after idle, all accessible over the same HTTP API.