Hacker News new | ask | show | jobs
by version_five 996 days ago
I'd be curious if you have any hard data about use. Mine is anecdotal too, but I see that llama.cpp is the very close second highest starred repo with llama on the name, after meta llama. Additionally, all the HF models seem to have ggml / gguf quantized versions . I'm not aware of a competing format for quantized models. There are also python bindings which are used in a lot of projects. What is a competing framework, other than pytorch, that's getting more use? Or is it all just pytorch (and some hf wrappers) and the rest is a rounding error?
2 comments

This reminds me of a comment elsewhere I also replied to today: it's sort of hard to even pretend I have global usage stats, so I won't.

There's a certain type of myopia that leads to overindexing on llama.cpp that makes it easy to classify. to wit:

> not aware of a competing format for quantized models

ONNX, that's how its done in prod and on other models besides (and including) LLaMa. Quantization is a general technique. 100 small variants of llama2 GGML weights feels like spam from that perspective. (sort of civitai vs. huggingface, hugginface smartly stopped that with AI art).

llm.mlc.ai for a more academic / less ad-hoc approach.

> [stars on github]

It's great for a very narrow & simple case that matches a large demographic on Github, and the demographics of people talking LLMs casually on HN: MacBook, wanna run locally and dream of a future free of having to ship your data to servers to get personalization. 5% of overall usage can be #2 in usage, if that makes sense.

> done in prod ... hugginface smartly stopped that with AI art ... more academic

Most human people doing LLM at home aren't interested in cargo culting the for-profit corporate and instituational stuff since their resources and incentives are so different from human being's incentives. As there are more humans than corporations or institutions and they tend to talk more, what they use tends to be more known than the stuff optimized for making a profit and serving business needs with business culture.

> This reminds me of a comment elsewhere I also replied to today

Right, looks like you made fun of / were condescendingly dismissive of my comment in another thread, I wouldn't have replied here if I'd realized you were the same person.

LOL I was thinking of an entirely different comment on another site. Give me credit here, I never cast aspersions on you, or even addressed you directly here.

I apologize for making you feel condescended to, but also would like to point out the _mean_ comment is +7, much less this one: there's a pretty significant gap in your knowledge and reality is going to keep intruding. Engaging in public is a wonderful way to learn, but you're coming across as glib and assertive and uninformed. You thought llama.cpp invented quantization and there's no other real format? :X

The “original” and by far most common format for quantization is GPTQ.

AWQ support is spreading more, which is nice.

Again, for a subset of the local LLM community. Quantization was not invented on Github, by llama.cpp, for LLMs in 2023.
If a tree falls in a forrest and no one is around, does it make a sound?

Of course quantization was invented well before LLMs. However, LLMs have dramatically accelerated development on quantization and resulted in an explosion in use.