Hacker News new | ask | show | jobs
by refulgentis 881 days ago
> Ollama is definitely the easiest way to run LLMs locally

Nitro outstripped them, 3 MB executable with OpenAI HTTP server and persistent model load

2 comments

Persistent model loading will be possible with: https://github.com/ollama/ollama/pull/2146 – sorry it isn't yet! More to come on filesize and API improvements
I just wanted to say thank you for being communicative and approachable and nice.
Who cares about executable size when the models are measured in gigabytes lol. I would prefer a Go/Node/Python/etc server for a HTTP service even at 10x the size over some guy's bespoke c++ any day of the week. Also, measuring the size of an executable after zipping is a nonsense benchmark in of itself
Not some guy, agree on zip, disagree entirely with tone of the comment (what exactly separates ollama from those same exact hyperbolic descriptions?)