|
|
|
|
|
by joebiden2
1060 days ago
|
|
What does this add over llama.cpp? Is it just an "easier" way to setup llama.cpp locally? If so, I don't really get it, because setting up llama.cpp locally is quite easy and well documented. And this appears to be a fork. Seems a bit fishy to me, when looking at the other "top" comments (with this one having no upvotes, but still #2 right now). (llama.cpp's original intention is identical to yours:
The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook¹) ¹ https://github.com/ggerganov/llama.cpp#description |
|
This project builds on llama.cpp in a few ways:
1. Easy install! Precompiled for Mac (Windows and Linux coming soon)
2. Run 2+ models: loading and unloading models as users need them, including via a REST API. Lots to do here, but even small models are memory hogs and they take quite a while to load, so the hope is to provide basic "scheduling"
3. Packaging: content-addressable packaging that bundles GGML-based weights with prompts, parameters, licenses and other metadata. Later the goal is to bundle embeddings and other larger files custom models (for specific use cases, a la PrivateGPT) would need to run.
edit: formatting