Hacker News new | ask | show | jobs
by joebiden2 1060 days ago
What does this add over llama.cpp? Is it just an "easier" way to setup llama.cpp locally?

If so, I don't really get it, because setting up llama.cpp locally is quite easy and well documented. And this appears to be a fork. Seems a bit fishy to me, when looking at the other "top" comments (with this one having no upvotes, but still #2 right now).

(llama.cpp's original intention is identical to yours: The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook¹)

¹ https://github.com/ggerganov/llama.cpp#description

1 comments

The llama.cpp project is absolutely amazing. Our goal was to build with/extend the project (vs try to be an alternative). Ollama was originally inspired by the "server" example: https://github.com/ggerganov/llama.cpp/tree/master/examples/...

This project builds on llama.cpp in a few ways:

1. Easy install! Precompiled for Mac (Windows and Linux coming soon)

2. Run 2+ models: loading and unloading models as users need them, including via a REST API. Lots to do here, but even small models are memory hogs and they take quite a while to load, so the hope is to provide basic "scheduling"

3. Packaging: content-addressable packaging that bundles GGML-based weights with prompts, parameters, licenses and other metadata. Later the goal is to bundle embeddings and other larger files custom models (for specific use cases, a la PrivateGPT) would need to run.

edit: formatting