Hacker News new | ask | show | jobs
The Llama Ecosystem: Past, Present, and Future (ai.meta.com)
112 points by allanberger 999 days ago
9 comments

I actively use llama.cpp and I don't find lack of mention of it as a slight -- it isn't directly affiliated with Meta. While there is tremendous innovation in the project, backwards compatibility is antithetical to the project's culture. I have been updating my models to GGUF, which isn't terrible, but I find I have to invest too much time to stay on top of the rapid, scorched-earth developments. Going to move to containerized checkpoints, as I do for my GPU models, for greater maintainability and consistency.
They didn't mention llama.cpp or show it in their picture, that's hopefully an oversight, it feels like a major slight. It's a (the?) major reason for llamas popularity.

I have mixed feelings, llama is great but it's perpetuated it's shitty license. They could have done so much more good if they'd used gpl style licensing, instead they basically subverted open source, using an objectively good model as leverage.

A lot of times there can be a feeling of being wrong without it being intentional. In this case I think the mention of AWS being a partner shows intent to put value behind what they are doing for their stakeholders.

The license for Llama 2 is pretty intense, but mirrors that intent by limiting interactions with individuals at scale, as well as limiting anything learned from the model through inference in being used to train another model. I suspect this is because the dataset on which it was trained is the company's IP, which again is for the shareholder's benefit.

The code is open though, I think out of necessity. AI poses a significant challenge for our survival, and making it open is an indication of transparency. They still need to make money at what they do and charge people for using their IP, within reason.

I guess my question would be that, if I used Llama (not the code, but the model itself) to code up a new model, would that be a derivative work?

Surely it's IP the shareholders have licensed, rather than their own IP.

Aka, my own comments being sublicensed back to me, after I licenced them to Facebook.

> It's a (the?) major reason for llamas popularity.

Absolutely not. There's a corner of the overall community that hovers it and overperceives it as everyone else only uses it too.

Its great if you have an Apple ARM machine and want to see an M2 Pro do 10 tokens/sec (and what could make an Apple ARM have 30 minute battery life).

I also doubt it's a slight, the only callouts are large commercial collaborations, ex. nVidia, AMD, Google are representative of each of the 3 groups we could assign it

I'd be curious if you have any hard data about use. Mine is anecdotal too, but I see that llama.cpp is the very close second highest starred repo with llama on the name, after meta llama. Additionally, all the HF models seem to have ggml / gguf quantized versions . I'm not aware of a competing format for quantized models. There are also python bindings which are used in a lot of projects. What is a competing framework, other than pytorch, that's getting more use? Or is it all just pytorch (and some hf wrappers) and the rest is a rounding error?
This reminds me of a comment elsewhere I also replied to today: it's sort of hard to even pretend I have global usage stats, so I won't.

There's a certain type of myopia that leads to overindexing on llama.cpp that makes it easy to classify. to wit:

> not aware of a competing format for quantized models

ONNX, that's how its done in prod and on other models besides (and including) LLaMa. Quantization is a general technique. 100 small variants of llama2 GGML weights feels like spam from that perspective. (sort of civitai vs. huggingface, hugginface smartly stopped that with AI art).

llm.mlc.ai for a more academic / less ad-hoc approach.

> [stars on github]

It's great for a very narrow & simple case that matches a large demographic on Github, and the demographics of people talking LLMs casually on HN: MacBook, wanna run locally and dream of a future free of having to ship your data to servers to get personalization. 5% of overall usage can be #2 in usage, if that makes sense.

> done in prod ... hugginface smartly stopped that with AI art ... more academic

Most human people doing LLM at home aren't interested in cargo culting the for-profit corporate and instituational stuff since their resources and incentives are so different from human being's incentives. As there are more humans than corporations or institutions and they tend to talk more, what they use tends to be more known than the stuff optimized for making a profit and serving business needs with business culture.

> This reminds me of a comment elsewhere I also replied to today

Right, looks like you made fun of / were condescendingly dismissive of my comment in another thread, I wouldn't have replied here if I'd realized you were the same person.

LOL I was thinking of an entirely different comment on another site. Give me credit here, I never cast aspersions on you, or even addressed you directly here.

I apologize for making you feel condescended to, but also would like to point out the _mean_ comment is +7, much less this one: there's a pretty significant gap in your knowledge and reality is going to keep intruding. Engaging in public is a wonderful way to learn, but you're coming across as glib and assertive and uninformed. You thought llama.cpp invented quantization and there's no other real format? :X

The “original” and by far most common format for quantization is GPTQ.

AWQ support is spreading more, which is nice.

Again, for a subset of the local LLM community. Quantization was not invented on Github, by llama.cpp, for LLMs in 2023.
If a tree falls in a forrest and no one is around, does it make a sound?

Of course quantization was invented well before LLMs. However, LLMs have dramatically accelerated development on quantization and resulted in an explosion in use.

fwiw I get more like 35-40 tokens/sec on my m1 macbook with a 7B model. That's way faster than I can read or skim. If we can figure out how to focus the expertise in small models, I don't see why it wouldn't be viable for those of us that don't want to share all of our convos with big tech.
I'm so happy that Meta was slightly late in the LLM race and so decided to go the chaos route by just open sourcing everything.
Their models are not open source. They made them available under terms that they can change at any time. Even source available products like Unity have more predictable terms.
> There are now over 7,000 projects on GitHub built on or mentioning Llama. New tools, deployment libraries, methods for model evaluation, and even “tiny” versions of Llama are being developed to bring Llama to edge devices and mobile platforms.

Let’s say I want to find the latest or most recent projects on this, is it possible to find them on GitHub based on that criteria?

Github has pretty varied filters, you can just search llama and sort by stars or recent activity etc. It doesn't look like it's possible to exclude python, but doing so might get you the "edge" ones. (Except they usually have python utilities for converting pytorch models)
Oh? But you can’t sort by date or things like that?
See https://github.com/search/advanced there are various date options
This is an important drum to continue to beat, but it needs to be paired with the caveat that we are not legally certain that Llama's weights are actually copyrightable. We're also not certain how much IP protections around trade secrets would apply to weights in this situation. A lot of that is uncertain.

Llama is not Open Source but until we get a court case ruling one way or the other we don't know if it's actually locked-down in the way Facebook intends; and I want to strike a balance between (correctly) pointing out that Facebook is misusing the Open Source label while not ceding to Facebook's claims about how much it can legally constrain people who have never signed a single Llama TOS.

I was actually expecting some comments regarding the 34B Llama 2 model. A quantized 34B model, such as Q5_K_M, might be the sweet spot for a moderate PC in terms of both speed and quality.
May I ask what are you all doing running a LLM locally?
cooking meth, making molotov cocktails, discussing my medical history, sex

ok seriously though I had fun over the weekend chatting with Samantha on a long car ride on my MacBook. We were mostly asking about history.

Which version are you running, and are you running it through llama.cpp or something? I was just thinking about exactly something like Samantha on the ride home today, and of course it already exists!
Latest 34B with llama.cpp [0] via the Mac app I’ve been building, FreeChat [1].

[0]: https://huggingface.co/TheBloke/Samantha-1.11-CodeLlama-34B-...

[1]: https://github.com/psugihara/FreeChat

Neat app and very cool idea. I guess I'll have to figure out how to get Samantha running on Windows. Thanks!
Trying to figure out if and how these can be used at companies that have regulatory requirements too strict to use hosted models. Sadly, Meta restricts use of Llama for anything ITAR (as opposed to other TOSes which only restrict weapons and defense).
Satisfying HIPAA requirements.
Any MOE models in development at meta?
People on HN like to complain about the license all the time like a crusade but I’m personally very thankful for their work and the community that is building off of it. I recently setup Ollama + codellama + continue dev and it’s game changer. Practically have been a drop in github copilot replacement but local.
Yeah the community is great.

It’d just be better if it was around RWKV or something that doesn’t prevent you from improving any models outside of the llama ecosystem.

It’s a great embrace, extend, extinguish play by meta.

RWKV literally didn't really exist when Llama was released.

> It’s a great embrace, extend, extinguish play by meta.

Meta released Pytorch, Pytext and even built ONNX with Microsoft to avoid an EEE situation. What more could you possibly want?

The license is a wedge that's destroying the meaning of open source, it's worth complaining about, and it's evil to have done it that way. I would have preferred a commercial license that was at least honest instead of a scorched earth ecosystem takeover like they've done. In a sense it's an extension of the big tech "provide something notionally free that's too good not to use and use it to destroy competition" model.
It is a wedge for some, but not at all 'evil', at least not for the reason you are providing. If you feel it is cannibalizing your company's business model, my apologies.
Lol is that the strawman that people have come up with, that not liking metas "only do what we allow" license must be anger about competition?

No. A good parallel would be if Microsoft (say) wrote their own linux clone that was compatible but had some proprietary enhancements that made it desirable over open source distros. The only catch being, it wasn't gpl licensed (they wrote it from scratch) it had a proprietary MS license that says you can only use it for things MS approved of, and are using it at their pleasure, to be revoked at any time.

People don't care about the license, they call it open source and move away from gnu/linux to the proprietary MS version, and now we're only doing what they allow us to.

That's exactly what's happening in the ML model world right now, but people are happy with the shiny models Facebook lets them use so they say "what's the big deal".