| I can correct mistakes. > it somehow merged Llama 4 Maverick's custom Arena chatbot version with Behemoth I can clarify this part. I wrote 'There was a scandal as facebook decided to mislead people by gaming the lmarena benchmark site - they served one version of llama-4 there and released a different model' which is true. But it is inside the section about the llama 4 model behemoth. So I see how that could be confusing/misleading. I could restructure that section a little to improve it. > Llama 405B was also trained on more than 15 trillion tokens[1], You're talking about Llama 405B instruct, I'm talking about Llama 405B base. Of course the instruct model has been traiend on more tokens. > why is there such a focus on token training count? I tried to include the rough training token count for each model I wrote about - plus additional details about training data mixture if available. Training data is an important part of an LLM. |