Hacker News new | ask | show | jobs
by imachine1980_ 780 days ago
it performs worst than 8b llama 3 so you probably don't need that much.
1 comments

Where do you see that? This comparison[0] shows it outperforming Llama-3-8B on 5 out of 6 benchmarks. I'm not going to claim that this model looks incredible, but it's not that easily dismissed for a model that has the compute complexity of a 17B model.

[0]: https://www.snowflake.com/wp-content/uploads/2024/04/table-3...