Hacker News new | ask | show | jobs
by rwl4 1029 days ago
The author of the article appears to have misunderstood one important detail about Code Llama.

They state:

> The Code Llama models were trained on 500B tokens, whereas Llama 2 models were trained on 2T tokens. Since the Code Llama model was trained on 4x fewer tokens, maybe a CodeLlama 70B version did not perform well enough due to LLM scaling laws—there was not enough training data.

But if you read the paper, on page 1, it says:

> Our approach is based on gradually specializing and increasing the capabilities of Llama 2 models by applying a cascade of training and fine-tuning steps [...]

In fact, they show a diagram at the top of page 3 that details the process, starting with Llama 2 foundation models.

Llama 2 Foundation models (7B, 13B, 34B) -> Code training 500B -> Python / Long Context.

See the paper here: https://arxiv.org/abs/2308.12950

4 comments

Good catch. Above that paragraph, I wrote that the Code Llama models were initialized with the Llama 2 weights, which makes this contradictory, indeed.

What I meant to say here was 500B domain-specific tokens. Maybe domain-specific is not the right word here, but tokens related to the problems that the LLM aims to solve.

EDIT: Updated the text to be more clear.

It does say this: Note that all Code Llama models were initialized with Llama 2 weights before they were further trained on code.
They also moved part of the article to another post and made it paywalled. Is that really necessary for someone who's already been a professor, has a famous book, and works at a (supposedly highly invested) AI company?
Right.

### off topic rants below

Somehow there are so many blogpost about these things, all trying to ask for your emails. Is it becoming easier to put more words together nowadays? I guess so.

I really wish there is a way to fact check all, instead of depending on good samaritans in a comment on HN to point these obvious misconceptions out.

> I really wish there is a way to fact check all, instead of depending on good samaritans in a comment on HN to point these obvious misconceptions out.

You mean like reading original sources? Frequently, big research projects like this come with an official paper[1] and/or blog post[2] explaining what they did.

[1] https://ai.meta.com/research/publications/code-llama-open-fo...

[2] https://ai.meta.com/blog/code-llama-large-language-model-cod...

I wonder how long until we can just use LLMs to do that for us - first summarizing a blog post (already something we've seen many examples of LLMs doing) but focusing on extracting factual claims, then using those as context when injesting linked sources to output to find what in the sources actually backs up the claims or if anything in the source goes against the claims made
> Somehow there are so many blogpost about these things, all trying to ask for your emails.

That's because Substack defaults to bothering people for their email, and lots of people are using Substack as their blogging platform these days.

> and lots of people are using Substack as their blogging platform these days.

they shouldn't. It's Medium all over again...