| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fwipsy 2 hours ago
	First impression: Third-party benchmarks or gtfo. Personally, I've never heard of either of these companies before. We're just supposed to take their word that they've matched the best models on the market? Sakana describes their model as a "Orchestration Model." Does that mean that it's actually a bunch of different models glued together?

3 comments

lifeformed 1 hour ago

Is it actually that hard to make good models or is it just about the amount of resources you have to do training? (This is an actual question, I really don't know.) I'm sure it's not trivial but does it really take world class secret knowledge to build off of the known existing techniques? I feel like there's tons of low hanging fruit still to explore, and time and resources are the limiting factor.

link

MostlyStable 1 hour ago

The gap between grok and Gemini to Claude and chatgpt suggests that yes it is that hard.

link

fwipsy 1 hour ago

Not hard to be a fast follower. Lots of companies are ~6-9 months behind. Reaching the actual bleeding edge is much harder.

link

Ifkaluva 1 hour ago

Their release post was on HN recently. The comments seemed to think that it was similar to OpenRouter, not an actual model.

link

OutOfHere 1 hour ago

Did Anthropic give you third-party benchmarks? Is that what you said to them? Yes, they're important, but the attitude is wrong.

link

bloppe 1 hour ago

Anthropic always publishes 3p benchmarks every time they announce a new model

link

MostlyStable 1 hour ago

And even if they didn't, they have a track record. Even if we did have benchmarks in this case I would still wait until people got there hands on it and formed a more holistic opinion.

link