Hacker News new | ask | show | jobs
by juliensalinas 1074 days ago
LLaMA 30B or 60B can be very impressive when correctly prompted. Deploying the 60B version is a challenge though and you might need to apply 4-bit quantization with something like https://github.com/PanQiWei/AutoGPTQ or https://github.com/qwopqwop200/GPTQ-for-LLaMa . Then you can improve the inference speed by using https://github.com/turboderp/exllama .

If you prefer to use an "instruct" model à la ChatGPT (i.e. that does not need few-shot learning to output good results) you can use something like this: https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored... The interesting thing with these Uncensored models is that they don't constantly answer that they cannot help you (which is what ChatGPT and GPT-4 are doing more and more).

5 comments

Just a reminder that LLaMA is not open—in order to use it legally you have to agree to Meta's terms, which currently means research use only. The versions circulating on torrents are essential pirated, and while I don't have an ethical problem with that at all you can't use it safely in a business.

The open replacements for LLaMA have yet to reach 30B, let alone 65B.

If anyone has a copyright claim to an LLM, the creators of the input data have more of a copyright claim than the company that trained it. There's a good chance they are not copyrightable at all. I'd bet there's a lot of people willing to take on that risk.

However, they might still fall under trade secret law.

Why would an LLM be any less copyrightable than any other piece of software?
The "software" part of an LLM is pretty trivial -- the interesting piece is the the weights. Since the weights are mechanically generated by a computer, it can be argued that the weights are not copyrightable, just like a photograph taken by a monkey isn't copyrightable.
The software is the matrix multiplication and gradient descent. We are talking about the numbers in the matrices. They are the output of a training algorithm, so we can only talk about the copyright on the training algorithm, and on its input data.
The model weights could be seen as a derived work, for which they didn't get the permission of the original copyright holders. Alternatively, it can be argued that the LLMs are no different than a fanfic writer trying to imitate the style of their favor author.

It's not obvious which way it will go, but I can see the point of those arguing that LLM data are ill-gotten gains.

For the same reason that phone books cannot have copyright.
People always bring this up like it’s a big deal, but most users aren’t interested in starting a business. We just wanna play with LLMs.

Frankly, I’m glad we don’t have a bunch of llamas in different skins being hawked like the current crop of “AI” startups that are just thin layers over OpenAI’s API.

That hasn’t been true for a while. Falcon 40B seemingly outperforms LLaMA 60B according to the OpenLLM leaderboard

https://huggingface.co/tiiuae/falcon-40b

Fair enough. I haven't really looked at Falcon as a replacement for LLaMA yet because it isn't supported by llama.cpp, but it looks promising.
Falcon is an open (Apache licensed) replacement for LLaMA, with a 40B version that's competitive with LLaMA 65B on benchmarks.
4-bit quantization removes a lot of the model's sophistication, and 60B parameters is still smaller than what GPT4 is using.
The point is that it's infinitely better in not being there "just to take your jobs and make a few VCs richer". Nobody even claimed it's more performant. It's like the difference getting nothing, but keeping your land, and getting glass pearls, losing your land. You have to completely ignore the meat of the argument to even pretend there is a contest.

And this is without considering what happened if we stopped feeding hostile actors and supported ourselves, instead of keeping to do the reverse. Not just here and there, but consistently for decades.

That argument seems more political than practical.

If on one hand you have a tool that you can actually use to help with your job, and another that sounds like a very advanced chatbot but doesn't actually provide value, well the second tool being open-source doesn't change that it's doesn't provide value.

(Also, assuming that open-source tools aren't going to upend a ton of people's jobs seems really naive. These people aren't going to be any less bitter that their jobs are taken by freelance nerds instead of corporate nerds.)

There is no way I am going to spin up my own worse LLM so a few people will make less money. Even if it was 1-5% better. It's just not worth the time.
It's not "a few people making less money", it's a few gigantic monoliths carving up the future, like blind watch-destroying gods -- or at least wanting to, no matter how nicely they dress it up. And it's not about utility or chance of success for everyone, either, but rather trying to do something in an ethical or more clean way just because that's more fun for them.

But I have to admit to being an idealist, and while I disagree with you because of that, I don't think you should be downvoted for basically just bringing up the majority position. It's easy to complain over people not being starry-eyed idealists that make great personal sacrifices to bring along an utopia for people in 10 generations, or whatever. It's way harder to find and teach the joy of doing something for the sake of doing it, and at the same time come up with medium and long-term ideas that are realistic enough to make working towards them fulfilling, but also genuinely beautiful and true. The whole "rather than teaching to build a ship (we can't even agree on!), teach people how to long for the ocean" thing. It's a really hard problem.

Wow. Beautifully articulated.

What a pleasant reply to read. I don’t have an argument regarding my position other than I agree that what you’re saying is true and that getting people like me to care and make sacrifices not just today but every day in a long term way is what makes hard problems hard.

Thank you :) But I didn't mean to say it's those pesky non-dreamers who make the problem hard. Three idealists will have ten possible utopias that are partly or completely mutually exclusive.

And to be frank, I think a lot of the finger pointing at people who don't care enough about issue X or Y is really because of not having found good ways to work constructively and make progress with however few people who do care. Partly also because people cannot agree (for long, tend to splinter into more pure sub groups and all that).

At any rate, the way can't be "I should feel bad and do better", but rather "I want what they got!". And the burden can't be on the people who aren't yet seeing anything that makes them excited to get excited anyway. And it can't be about being selfless for the benefit of others, or future generations. It has to its own reward right here and now. It is about and for you just as it is anyone else, if you know what I mean. Sacrificing others and sacrificing oneself is sacrificing people in both cases. Neither is noble IMO.

I guess the best chance of fighting tech giant strangleholds is still empowering "normal people" to carve out their own little spaces. All people will not finally learn how to make websites if only we crushed Facebook and what have you, but instead if more people had fun making their own little websites, and if we could come up with good ways for them to connect(peer-to-peer on the desktop, right after Linux!), Facebook and others would play nicer. It's not that big companies are a problem, it's the abusive things they do when they're the only game in town.

And likewise, and back on topic: a really good argument would be something I don't have the knowledge for, namely things you can do with a LLM that you can fully control (or at least can wildly poke at and experiment with, or just "download mods for") versus a much more powerful LLM that you don't really control, other than your prompts.

Thanks for reading!

> 60B parameters is still smaller than what GPT4 is using

I mean if the article is right, then it's about 3.3% the size of GPT 4 (although it's a sparse model so not all of it is used on every pass).

Meta also didn't train LLaMAs on nearly as much code it seems, so they're much worse for that in general.

Does it? The GTPQ paper claims that the accuracy loss is small.
Can't lose what it didn't have in the first place.
I remember to have read somewhere that GPT4 is not a single model but several models whose parameter counts are reported as a single sum. Perhaps quite doable but at lower speeds?
The article linked here talks about GPT4 being a mixture of experts, which is exactly what you’re describing
How do you correctly prompt it? A lot of people are not familiar with how to do this. I think this would improve how many people are using the non-OpenAI models.
How low can you get the memory and computational power requirements that way?
You can run that model (Wizard-30) on a computer with 64 gigabytes of RAM (or smaller, I don't know how tight you can cut it). You obviously want fast RAM and a good CPU, but you don't need a GPU.
I am running 30b llama models (4 bit quantized using llama.cpp) on 32 gb of ram and no GPU. I get around 2 tokens/second.
You can also travel on a bike from NY to LA.
>You can also travel on a bike from NY to LA.

You can. In fact, my brother did so a bunch of years ago. He found it to be a wonderful experience that made his life better.

He's also flown on a commercial airplane from NY to LA (as have I, as well as millions of others) and while it got him to Los Angeles, it didn't provide the levels of sensory input, personal interactions and experience that riding his bicycle did.

That's not to say everyone should ride bicycles across the US every time they need/want to make such a trip, but doing so at least once can be a more positive experience than sitting next to some strangers for five hours.

The satisfaction of doing so, or the experiences in interacting with people and the landscape during such a trip aren't quantifiable, but reducing the value of doing so (if I'm missing your point here, my apologies) to the time required to make such a trip is reductive in the extreme IMHO.

Edit: Clarified my prose.

Love this comment so much. Your brother sounds cool :)
Just imagine if Boeing lobbied the government to pass a law banning you biking from NY to LA.
AFAIK you can get away with a swapfile, no need for large amounts of RAM.
wont that nearly kill your ssd if you do it for extended periods of time?
Most of the ram is for storing the model once it is loaded it is read only so will not harm an SSD.
It only reads from memory,not swap directly. If it needs to read something from swap, it'll write out something from memory to swap, then read the swap into memory. Reading 1gb of swap, will essentially write 1gb to the ssd too. (rough numbers)

Correct me if I misunderstand swap?

> The interesting thing with these Uncensored models is that they don't constantly answer that they cannot help you (which is what ChatGPT and GPT-4 are doing more and more)

that's great to hear. that political correctness in gpt is annoying.