Hacker News new | ask | show | jobs
by ojosilva 1202 days ago
Excellent packaging OP! I just wanted to say 2 things relating to LLaMa:

1) 7B is unusable for anything really, in case you are hopeful;

2) 68B otoh is awesome ("at least DaVinci level").

I don't know if this is something FB/Meta planned strategically but this LLaMa-mania (LLaMania?) over the weekend is their November/2022 chatGPT moment. If they (Mark) take it seriously, it could become a strong hand in AI and a hint of how the industry could be shaped in the near future, with cloud models competing with local installs.

Think about it: who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight. A dystopian and scary thought.

5 comments

Something important, is that LLama was leaked, it was never directly published by Meta. So its basically piracy, and even if you got it officially, the license is very restrictive.
I dispute that the model can be copyrightable in the first place.
As long as the courts don't dispute it, then our disputes don't matter.

They'd be no better than some "sovereign citizen" disputing their arrest...

The idea that models can't be copyrighted isn't far fetched. The basic idea is that models are created by an automated process not by a person.

The courts have already upheld that AI generated output is not copyrightable for this exact reason.

So if you do not buy that it applies to models then you would have to explain the difference between the process which outputs bits into a model's layers (aka training) and the process which takes bits into the input layer and then dumps out the subsequent bits of the output layer (inference /generation).

Then explain why that distinction is different in regards to the applicability of copyright.

I'm not sure that even the "AI generated output is not copyrightable" stance will be maintained - as long as "AI generated output" becomes big business. Same way copyright was invented and Sonny-Bono-extended to the max as long as content became big business.

In the model's case, though, it's even easier why it could be copyrightable, as a "baked" model is still created by people fine-tuning it, setting parameters and hardcoded stuff, training it with this or that set and excluding other, and so on.

For example music composed and rendered as audio by generative algorithms (something which doesn't even need AI, just some rules and stohastic processes) has been created and copyrighted just fine for decades...

All the arguments for why photographs are copyrighted would seem to apply. The photographer isn't painting the image, but his artistic input is still vital to creating the image. Same with training these models: the training is just an algorithm on some data, but choosing the right hyperparameters and training data is an artistic expression of the author, making copyright apply
If a non-human presses the button on the camera the photograph is not copyrightable even if a human set up the camera intending for the non-human to press it. https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

For the same reasons this monkey photo cannot be copyrighted it is highly likely that AI generated art is uncopyrightable and that would also mean that models are. The fact that humans set up the systems which produce the art/models with the intention of getting an end results generally like the one they get is simply not meaningful to the copyright dispute.

You can restrict someone with a license even if you can't copyright the underlying technology.
> who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight.

That's true even if you can download the whole model. It's not like we can figure out what it's doing from looking at the weights. Training the model locally might avoid intentional bias, but that's what takes a huge GPU farm.

> Think about it: who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight. A dystopian and scary thought.

You have perfectly described what OpenAI did. They released a moralizing “biased” model behind a gated API with no oversight. The only dystopia is one in which corporations get to decide what is, or isn’t considered biased.

sorry for the extremely dumb question but is it possible to run the 68B model in a 8gb ram computer?
in general, assume 2GB per billion parameters - with quantisation you can get this down to <1GB (~500MB for 3 bit?), but even with that you'll only be able to run quantised llama-13B in the best case

Having said that: if you are feeling incredibly patient you can technically run the 68B parameter model by swapping to disk, although it will not be a pleasant experience (think minutes or hours per token instead of tokens per second)

Additionally worth noting pure CPU inference is much slower than GPU/TPU inference, so the output will be much slower than a ChatGPT-like service even if it does fit in your computer's RAM

thanks for explaining! How much GPU memory would work nice with 68B?
they said 2g per 1 billion....and it's called 68B...I presume that's 68 billion... 68*2...so at least 136g?
68/2, not 682
So, if I understand correctly, that's what you need to run the best model?

With GPU:

VRAM + RAM >= 68/2

Without GPU:

RAM >= 68/2

You can't, it needs around 40GB of RAM.

Technically you can by swapping to disk but it would be too slow to be usable.

What you can do however is use the 7B model with 4bit quantization and use it within 8GB RAM.

Is this 68B of RAM?

How do you get access to that on a Macbook?

That’s 68 billions of parameters. It probably does not fit on ram. Though If you encode each parameter using one byte, you would need 68GB RAM which you could get on workstations at this point.
It fits, whisper.cpp uses 4 bit quantization, 13B model takes a little bit more than 8gb and around 9gb ram while inferencing.
Everyone with “only” 64GB of RAM is pouting today, including me
More like finally "proven right" to have needlessly kept feeding 4/5th of 64GB to Chrome since 2018
You can run llama using 4 bits per parameter, 64 GB of RAM is more than enough
4 bits is ridiculously little. I'm very curious what makes these models so robust to quantization.
Read The Case for 4 Bit Precision. https://arxiv.org/abs/2212.09720

Spoiler: it's the parameter count. As parameter count goes up, but depth matters less.

It just so happens that at around 10B+ parameters you can quantize down to 4bit with essentially no downsides. Models are that big now. So there's no need to waste RAM by having unnecessary precision for each parameter.

What if you have around 400GB of RAM? Would this be enough?
What I'm referring to requires around 67GB of RAM. With 400GB I would imagine you are in good shape for running most of these GPT-type models.
Seems to use about 40~ GB RAM here...