| HN Mirror

The llama.cpp project is absolutely amazing. Our goal was to build with/extend the project (vs try to be an alternative). Ollama was originally inspired by the "server" example: https://github.com/ggerganov/llama.cpp/tree/master/examples/...

This project builds on llama.cpp in a few ways:

1. Easy install! Precompiled for Mac (Windows and Linux coming soon)

2. Run 2+ models: loading and unloading models as users need them, including via a REST API. Lots to do here, but even small models are memory hogs and they take quite a while to load, so the hope is to provide basic "scheduling"

3. Packaging: content-addressable packaging that bundles GGML-based weights with prompts, parameters, licenses and other metadata. Later the goal is to bundle embeddings and other larger files custom models (for specific use cases, a la PrivateGPT) would need to run.

edit: formatting