| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rwl4 1029 days ago

The author of the article appears to have misunderstood one important detail about Code Llama.

They state:

> The Code Llama models were trained on 500B tokens, whereas Llama 2 models were trained on 2T tokens. Since the Code Llama model was trained on 4x fewer tokens, maybe a CodeLlama 70B version did not perform well enough due to LLM scaling laws—there was not enough training data.

But if you read the paper, on page 1, it says:

> Our approach is based on gradually specializing and increasing the capabilities of Llama 2 models by applying a cascade of training and fine-tuning steps [...]

In fact, they show a diagram at the top of page 3 that details the process, starting with Llama 2 foundation models.

Llama 2 Foundation models (7B, 13B, 34B) -> Code training 500B -> Python / Long Context.

See the paper here: https://arxiv.org/abs/2308.12950

4 comments

rasbt 1029 days ago

Good catch. Above that paragraph, I wrote that the Code Llama models were initialized with the Llama 2 weights, which makes this contradictory, indeed.

What I meant to say here was 500B domain-specific tokens. Maybe domain-specific is not the right word here, but tokens related to the problems that the LLM aims to solve.

EDIT: Updated the text to be more clear.

link

sp332 1029 days ago

It does say this: Note that all Code Llama models were initialized with Llama 2 weights before they were further trained on code.

link

behnamoh 1028 days ago

They also moved part of the article to another post and made it paywalled. Is that really necessary for someone who's already been a professor, has a famous book, and works at a (supposedly highly invested) AI company?

link

jxy 1029 days ago

Right.

### off topic rants below

Somehow there are so many blogpost about these things, all trying to ask for your emails. Is it becoming easier to put more words together nowadays? I guess so.

I really wish there is a way to fact check all, instead of depending on good samaritans in a comment on HN to point these obvious misconceptions out.

link

cosmojg 1029 days ago

> I really wish there is a way to fact check all, instead of depending on good samaritans in a comment on HN to point these obvious misconceptions out.

You mean like reading original sources? Frequently, big research projects like this come with an official paper[1] and/or blog post[2] explaining what they did.

[1] https://ai.meta.com/research/publications/code-llama-open-fo...

[2] https://ai.meta.com/blog/code-llama-large-language-model-cod...

link

semi 1028 days ago

I wonder how long until we can just use LLMs to do that for us - first summarizing a blog post (already something we've seen many examples of LLMs doing) but focusing on extracting factual claims, then using those as context when injesting linked sources to output to find what in the sources actually backs up the claims or if anything in the source goes against the claims made

link

simonw 1029 days ago

> Somehow there are so many blogpost about these things, all trying to ask for your emails.

That's because Substack defaults to bothering people for their email, and lots of people are using Substack as their blogging platform these days.

link

behnamoh 1028 days ago

> and lots of people are using Substack as their blogging platform these days.

they shouldn't. It's Medium all over again...

link