| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rybosome 846 days ago
	> Your python script is ephemeral and run on demand. You don't want to load the model each execution because it's slow, so you need a daemon. Ollama can serve that role without you having to write anything other than the script. This is exactly my use case for it; I invoke a Python binary which uses the Ollama API and get a model response within seconds because it’s already resident in memory.