| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CuriouslyC 834 days ago
	They didn't compare against the best models because they were trying to do "in class" comparisons, and the 70B model is in the same class as Sonnet (which they do compare against) and GPT3.5 (which is much worse than sonnet). If they're beating sonnet that means they're going to be within stabbing distance of opus and gpt4 for most tasks, with the only major difference probably arising in extremely difficult reasoning benchmarks. Since llama is open source, we're going to see fine tunes and LoRAs though, unlike opus.

4 comments

blackeyeblitzar 834 days ago

Llama is open weight, not open source. They don’t release all the things you need to reproduce their weights.

link

mananaysiempre 834 days ago

Not really that either, if we assume that “open weight” means something similar to the standard meaning of “open source”—section 2 of the license discriminates against some users, and the entirety of the AUP against some uses, in contravention of FSD #0 (“The freedom to run the program as you wish, for any purpose”) as well as DFSG #5&6 = OSD #5&6 (“No Discrimination Against Persons or Groups” and “... Fields of Endeavor”, the text under those titles is identical in both cases). Section 7 of the license is a choice of jurisdiction, which (in addition to being void in many places) I believe was considered to be against or at least skirting the DFSG in other licenses. At best it’s weight-available and redistributable.

link

blackeyeblitzar 833 days ago

Those are all great points and these companies need to really be called out for open washing

link

amitport 833 days ago

It's a good balance IMHO. I appreciate what they have released.

link

ikurei 833 days ago

I appreciate it too, and they're of course going to call it "open weights", but I reckon we (the technically informed public) should call it "weights-available".

link

lumost 833 days ago

Has anyone tested how close you need to be to the weights for copyright purposes?

link

tdullien 833 days ago

It's not even clear if weights are copyrightable in the first place, so no.

link

whiplash451 831 days ago

Is it really useful to make an LLM open source when it takes millions of $ to train it?

At that scale, open weights with permissive license is much more useful than open source.

link

throwaway4good 833 days ago

Which large model projects are open source in that sense? That its full source code including training material is published.

link

soccernee 833 days ago

Olmo from AI2. They released the model weights plus training data and training code.

link: https://allenai.org/olmo

link

ktzar 833 days ago

even if they released them, wouldn't it be prohibitively expensive to reproduce the weights?

link

zingelshuher 832 days ago

It's impossible. Meta itself cannot reproduce the model. Because training is randomized and that info is lost. First samples a coming at random. Second there are often drop-out layers, they generate random pattern which exists only on GPU during training for the duration of a single sample. Nobody saves them, it would take much more than training data. If someone tries to re-train the patterns will be different, which results in different weight and divergence from the beginning. Model will converge to something completely different. With close behavior if training was stable. LLMs are stable.

So, no way to reproduce the model. This requirement for 'open source' is absurd. It cannot be reliably done even for small models due to GPU internal randomness. Only the smallest trained on CPU in single thread. Only academia will be interested.

link

lawlessone 833 days ago

1.3 million GPU hrs for the 8b model. Take you around 130 years to train on a desktop lol.

link

iamlearningai 829 days ago

Interesting. LLAMA is trained using 16K GPUs so it would have taken around a quarter for them. An hour of GPU use costs $2-$3 so training a custom solution using LLAMA should be atleast $15K to $1M. I am trying to get started with this thing. A few guys suggested 2 GPUs were a good start but I think that would only be good for 10K training samples.

link

danielhanchen 833 days ago

On the topic of LoRAs and finetuning, have a Colab for LoRA finetuning Llama-3 8B :) https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe...

link

wiz21c 833 days ago

"within stabbing distance"

dunno if english is your mother tongue, but this sounds really good (although a tad aggressive :-) )) !

link

waffletower 832 days ago

As Mike Judge's historical documents show, this enhanced aggression will seem normal in a few years or even months.

link

htrp 834 days ago

ML Twitter was saying that they're working on a 400B parameter version?

link

mkl 833 days ago

Meta themselves are saying that: https://ai.meta.com/blog/meta-llama-3/

link