|
|
|
|
|
by version_five
996 days ago
|
|
I'd be curious if you have any hard data about use. Mine is anecdotal too, but I see that llama.cpp is the very close second highest starred repo with llama on the name, after meta llama. Additionally, all the HF models seem to have ggml / gguf quantized versions . I'm not aware of a competing format for quantized models. There are also python bindings which are used in a lot of projects. What is a competing framework, other than pytorch, that's getting more use? Or is it all just pytorch (and some hf wrappers) and the rest is a rounding error? |
|
There's a certain type of myopia that leads to overindexing on llama.cpp that makes it easy to classify. to wit:
> not aware of a competing format for quantized models
ONNX, that's how its done in prod and on other models besides (and including) LLaMa. Quantization is a general technique. 100 small variants of llama2 GGML weights feels like spam from that perspective. (sort of civitai vs. huggingface, hugginface smartly stopped that with AI art).
llm.mlc.ai for a more academic / less ad-hoc approach.
> [stars on github]
It's great for a very narrow & simple case that matches a large demographic on Github, and the demographics of people talking LLMs casually on HN: MacBook, wanna run locally and dream of a future free of having to ship your data to servers to get personalization. 5% of overall usage can be #2 in usage, if that makes sense.