|
|
|
|
|
by NitpickLawyer
64 days ago
|
|
They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still. |
|
No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.
I have my own agentic benchmark[1] I use to compare models.
Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.
By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).
Llama 4 was a bad model, unfortunately.
[1] https://sql-benchmark.nicklothian.com/#all-data