Hacker News new | ask | show | jobs
by fwipsy 2 hours ago
First impression: Third-party benchmarks or gtfo. Personally, I've never heard of either of these companies before. We're just supposed to take their word that they've matched the best models on the market?

Sakana describes their model as a "Orchestration Model." Does that mean that it's actually a bunch of different models glued together?

3 comments

Is it actually that hard to make good models or is it just about the amount of resources you have to do training? (This is an actual question, I really don't know.) I'm sure it's not trivial but does it really take world class secret knowledge to build off of the known existing techniques? I feel like there's tons of low hanging fruit still to explore, and time and resources are the limiting factor.
The gap between grok and Gemini to Claude and chatgpt suggests that yes it is that hard.
Not hard to be a fast follower. Lots of companies are ~6-9 months behind. Reaching the actual bleeding edge is much harder.
Their release post was on HN recently. The comments seemed to think that it was similar to OpenRouter, not an actual model.
Did Anthropic give you third-party benchmarks? Is that what you said to them? Yes, they're important, but the attitude is wrong.
Anthropic always publishes 3p benchmarks every time they announce a new model
And even if they didn't, they have a track record. Even if we did have benchmarks in this case I would still wait until people got there hands on it and formed a more holistic opinion.