Hacker News new | ask | show | jobs
by jacobn 761 days ago
Forgot one: the positional encoding also changed, llama3 uses RoPE, gpt2 uses a learned embedding.