Hacker News new | ask | show | jobs
by rybosome 846 days ago
> Your python script is ephemeral and run on demand. You don't want to load the model each execution because it's slow, so you need a daemon. Ollama can serve that role without you having to write anything other than the script.

This is exactly my use case for it; I invoke a Python binary which uses the Ollama API and get a model response within seconds because it’s already resident in memory.