| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by NitpickLawyer 64 days ago
	They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.

4 comments

nl 64 days ago

> They were ~gpt4o, with the added benefit that you could run them on premise.

No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.

I have my own agentic benchmark[1] I use to compare models.

Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.

By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).

Llama 4 was a bad model, unfortunately.

[1] https://sql-benchmark.nicklothian.com/#all-data

link

ac29 63 days ago

> By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!)

Gemma 4 E4B is slightly confusingly named, its a 8B param model

link

nl 63 days ago

You are completely right on both counts.

It is a 8B model, and it is confusingly named. In fact I made exactly the same point[1] when it was released and promptly forgot!

[1] https://news.ycombinator.com/item?id=47622694

link

refulgentis 64 days ago

Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.

link

prodigycorp 64 days ago

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

link

refulgentis 64 days ago

I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.

link

alex1138 64 days ago

Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit

link

refulgentis 64 days ago

I don't know how Zuck intervening could change float32s in a trained model, so I don't think I think that, but maybe I'm parsing your words incorrectly.

link

pixel_popping 64 days ago

failing non-stop at tool calls on top of that.

link

owebmaster 63 days ago

Thanks for calling me a bot. Llama4 and meta ai sucks

link