Hacker News new | ask | show | jobs
by rasbt 1029 days ago
Good catch. Above that paragraph, I wrote that the Code Llama models were initialized with the Llama 2 weights, which makes this contradictory, indeed.

What I meant to say here was 500B domain-specific tokens. Maybe domain-specific is not the right word here, but tokens related to the problems that the LLM aims to solve.

EDIT: Updated the text to be more clear.