Hacker News new | ask | show | jobs
by om8 361 days ago
> 50 tokens is not really very much Yes! And also llama3.1’s tokens are different from Qwen and llama1 tokens. That’s the first model where meta started to use very large vocab_size.