| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by WhitneyLand 50 days ago

The frontier labs commonly trade spots at the top of the benchmarks with each new model release.

The timing of these price cut discussions says to me OpenAI has no imminent release that will be edging out Mythos/Fable.

If so the question becomes when can they do so, or is this possibly a turning point where Anthropic keeps the crown to themselves for the foreseeable future.

6 comments

mattjoyce 50 days ago

At the right price, these model don't need to be the best, good enough will do. I think we're fast approaching good enough for most users.

link

kouteiheika 50 days ago

This. Here's a quick experiment I did yesterday.

I got a new $20 Claude subscription to try the new Fable model. I gave it a single prompt, and it barely finished, using up my whole session quota (it was at ~95% when it finished) and 10% of my weekly quota.

For comparison, with the Kimi Code $40 subscription I can pretty much constantly run two/three agents in parallel for the whole week, and I never run out of quota. I can blindly throw it at anything and everything without worrying about hitting the limits. (And it's not exactly a cheap model to run -- it has 1 trillion parameters!)

Is Kimi as good as Claude? Of course not. But you don't need the absolute state-of-art for most things. If I don't have exceptionally difficult tasks it makes no sense to use it. Just throw Kimi at it, and even if it needs to run 2 or 3 times longer in the background I don't care, because I'm not running out of tokens there.

link

nl 50 days ago

A word of caution on this.

I've tried this too, and was disappointed.

Kimi generally benchmarks at "a bit more intelligent than Sonnet Medium" levels[1] and I'd agree broadly with this assessment.

If you have adapted your coding to rely on the agentic style that is doable in Opus 4.7+ then you will find Kimi disappointing.

If you are using it in a more targeted way then it can work well.

[1] https://artificialanalysis.ai/agents/coding-agents?agents=cl...

link

kouteiheika 50 days ago

Yes, I would agree with this.

I think it works best when you're using the agent in a more hands-on way with a targeted prompt. If you're obsessive about code quality like I am (so you thoroughly review and, when needed, reprompt or even rewrite what the agent does) then you'll be fine, but if you like to just throw a prompt at the wall and expect it to plan and execute the whole thing perfectly then you'll be disappointed.

A middle-ground trick one can use is to have Opus (or Fable now) plan the whole thing and get something cheaper like Kimi execute on it.

link

rented_mule 50 days ago

CodeWhale (formerly deepseek-tui) automates this over DeepSeek V4 Flash and Pro. My shallow understanding is that it prompts the model to evaluate the complexity of a given task, then decides on Flash vs. Pro at various reasoning levels for that task. This can help with both cost and speed. If other agent platforms don't already do this, I have to imagine they will at some point.

I'm retired and can't justify spending too much on these things. CodeWhale over DeepSeek is helping me understand this space much better (and have some fun!), and it's quite affordable. I've spent ~30 hours using it over the last couple of weeks, and I've spent $3.89 on DeepSeek in that time. If I don't feel like writing any code for a few weeks, I pay nothing. Looking at DeepSeek's dashboard, about 60% of my requests have gone to Pro and 40% to Flash. I've used 97M Pro tokens and 19M Flash tokens (well over 90% of each have been cache hits, so the price is much lower than it would otherwise be).

link

emodendroket 49 days ago

Cursor's Auto mode is built on this premise though I can't say how effectively it categorizes with limited experience.

link

selicos 50 days ago

This is in the direction of Mixture Of Export (MOE) setups. A trained 'router' sits on top of different expert models and routes work to the best/most efficient model for that task, and integrates the work into a whole to provide to the user.

At least, that is what I get from the MOE style. Small and fast experts with a router LLM on top to best use them, then the harness to keep it all together.

link

poly2it 50 days ago

Is there any open model that can emulate the agentic experience you get with Opus 4.7?

link

rstuart4133 49 days ago

GLM 5.1 gets close to 4.6. It can happily run for hours and achieve a result. It given it bugs like a race condition that lead to a count being out by 1 after millions of operations, somewhere in a hundred thousand lines of C code littered with locks and atomic swaps, and it found (as did Opus). Most other models can't.

I'm using Fable now and GLM 5.1 doesn't really compare. But it's literally 1/20 the price. I can't use Fable for coding - it's too expensive. So now we have three levels of models - lightweight ones you dispatch en masse to find things, ones capable of agentic coding tasks that can run for hours like Opus, and GLM (and possibly open source ones - I've only tried a few), and now Fable, which is a truly helpful "architecture buddy". Fable still makes many, many, mistakes, so you have to review every word it writes.

link

nl 49 days ago

Not yet that I've tried, and I'm pretty systematic about test driving them.

I keep https://sql-benchmark.nicklothian.com/#all-data up-to-date with latest releases and try out most that score 24+.

GPT 5.5+ or Opus 4.6+ are the only things I find useful like this. Notably Gemini isn't useful in this way.

link

EagnaIonat 49 days ago

> This. Here's a quick experiment I did yesterday.

It's like running a sports car and then complaining it burns through petrol too fast.

The truth is the model while impressive is not needed for much of what people need.

Local models can do the work and just offload heavy lifting to the cloud models.

link

JKCalhoun 50 days ago

Not only that, it's easy to let ethics steer my choice as well. And at this point I suspect OpenAI will never earn my respect.

link

emodendroket 49 days ago

I find it is a quite reliable workflow to ask a strong model to design a plan and then point a weaker one at executing. The agent harnesses themselves are baking in similar concepts though.

link

panos_news 50 days ago

Yeah, that's how I feel too. I am totally fine with xHigh GPT 5.5 when it comes to coding.

link

boc 50 days ago

OTOH, using the best is a competitive advantage when time = money. It's like giving your engineers a slow laptop because it's cheaper. It may be cheaper but not worth the cost.

link

atraac 50 days ago

Unless your job is purely producing code pointlessly, this is not a really good comparison. Most of the time really is spent on understanding the problem and figuring out solutions, not waiting on CPU.

link

lelanthran 50 days ago

> OTOH, using the best is a competitive advantage when time = money. It's like giving your engineers a slow laptop because it's cheaper. It may be cheaper but not worth the cost.

That doesn't imply giving your devs the best laptop makes any difference.

How much more productive will your devs be if you upgrade them from a 32GB RAM, 8-core laptop to a 768GB RAM 96-core threadripper?

In your analogy, Kimi may not be the 4-core celeron with 4GB of RAM, it's more like the 8-core AMD with 32GB of RAM.

link

knollimar 50 days ago

768GB seems oddly specific for Kimi

link

bushbaba 50 days ago

Not necessarily, inference speed also has huge time aspect. For example anthropic takes nearly twice as long as OpenAI models for my tasks with both having similar success rates.

link

opennash 50 days ago

agreed, unlimited gpt5.5 fast is sufficient for 90% of my use cases. Tried Fable, nice to have but we don't really need it.

link

stingraycharles 50 days ago

It seems that OpenAI lacks a clear target audience, they try to be everything for everyone. Anthropic is targeting professionals / enterprise users.

I don’t fully understand why OpenAI lacks this focus, as clearly identifying a target market is one of the first things you do with a business strategy. But instead they just seem to throw stuff against the wall and see what sticks.

link

jillesvangurp 50 days ago

I think this is too simplistic. Codex is increasingly useful for business usage. I use it for both technical stuff and doing non technical things with my inbox, google drive, etc. It's pretty good for that. And it's pretty clear that business users are very much untapped potential at this point. They need proper agents with tunable guard rails and all the rest.

It seems very competent at coding tasks as well. I don't think Anthropic has a huge edge on that front. It's more of a neck and neck race with proponents in both camps. I ignore most benchmarks at this point; I don't think they have much relevance for normal users.

I think it's actually necessary for both to try out different approaches. Nothing is set in stone yet when it comes to the UX of these things.

link

ethbr1 50 days ago

> I don’t fully understand why OpenAI lacks this focus, as clearly identifying a target market is one of the first things you do with a business strategy

Resource curse: https://en.wikipedia.org/wiki/Resource_curse

I've been inside companies that have struggled with this, and the real internal story goes like this:

   1. Surprise product growth
   2. Revenue go brr, org expands
   3. Everyone gets promoted as org expands
   4. Because the product sold itself, there was little selection pressure on the sales / customer success orgs to evaluate their effectiveness
   5. Leadership gets saturated with people who just aren't very good at their job
   6. None of those people get fired/demoted, because the company never had to develop "What to do with a bad leader?" muscles
   7. This eventually manifests as an increasing (customer) <-> (engineering) disconnect (as sales/cs aren't doing their job)
   8. People begin to ask why the company isn't doing (insert obvious thing)
   9. It's because VP-of-whatever is chasing fantasies instead of reporting customer needs to engineering

Tl;dr - Don't trust promotions made during the good times. Continuously reevaluate leaders.

link

broodbucket 50 days ago

They have the consumer market but want the enterprise market, because it's a lot more lucrative, so they're probably going to just keep chasing that even though there's no signs they'll stop losing to Anthropic. They don't need to do that much to keep the consumer market because of momentum.

link

ralph84 50 days ago

Questionable whether the enterprise market really is the most lucrative. The biggest of big tech all have significant revenue from the consumer market. Compare Apple, Google, Meta, to IBM, Salesforce, ServiceNow.

link

broodbucket 50 days ago

Enterprise market is paying by token and using a lot of tokens. Consumer market is paying a subscription that they can't raise too high or they'll lose users to competition. Seems to me that the enterprise market scales a lot higher.

link

stingraycharles 48 days ago

Enterprise / B2B has always been easier and more lucrative. Once a large enterprise integrates with your product, they won’t move away unless there’s an actual issue. So then the “moat” becomes the contract.

Meanwhile, OpenAI is spending ludicrous amounts on things like a Sora-TikTok app in order to create a network effect, and failing at it.

Seems pretty obvious to me what the better strategy is.

link

skeptic_ai 50 days ago

Have you seen many corporations complaining and caping usage to 20-200usd per developer per month. I doubt will change much. Many are considering on premise now.

link

WarmWash 50 days ago

The consumer market by and large pays $20/mo to get tokens in response to stuff like

"My friend hurt my feelings and I don't know how to approach the problem" routed to whatever the default model is.

link

byzantinegene 50 days ago

it's really not much compared to the amount they are spending on training. 100 developers at $200 per month is just $20000

link

naveen99 50 days ago

$240,000

link

mynegation 50 days ago

Apple, sure. But Google and Meta are really advertising companies, whose income stream comes from enterprises, big and small.

link

solumunus 50 days ago

Not to mention Google Cloud.

link

harrouet 50 days ago

OpenAI actually never had a focus. Their VC pith was: once the AI is good enough, it will find our business model. They've raised money on that.

With that said you are right, it seems OpenAI got numbed by ChatGPT's initial success and tried to be the go-to brand for consumers... which is Google's playground.

Meanwhile, Anthropic led the B2B market with a clever segmented approach, and got well-paying customers.

link

solumunus 50 days ago

Because they gained a HUGE amount of “normal” users and I think they feel desperate to monetise that. It’s their potential massive edge on competition, they just haven’t found any way to realise it and I suspect they won’t.

link

kennywinker 50 days ago

They keep asking chatgpt how to monetize and it keeps giving slop answers?

link

jrsj 50 days ago

> The timing of these price cut discussions says to me OpenAI has no imminent release that will be edging out Mythos/Fable.

Initially I had the same thought but I think this might actually have more to do with Fable being removed from the Claude subscription later this month. At that point it becomes cost prohibitive to use for most tasks anyways & this is the perfect opportunity to compete on price, especially given enterprise customers are already looking to improve spend management

link

d--b 50 days ago

The benchmark is not everything, the LLMs have their “personality” and GPT is annoying AF.

Also, I don’t about others, but I personally strongly dislike OpenAI’s leadership’s hypocrisy. I find them losing the race highly satisfying.

link

lelanthran 50 days ago

> If so the question becomes when can they do so, or is this possibly a turning point where Anthropic keeps the crown to themselves for the foreseeable future.

This specific crown (Best Performing Model) appears to be made out of thorns: pay 100x more for maybe a 10% improvement in capabilities.

Not sure what the goal is, here.

link

mnicky 50 days ago

It's simple I think - over time the price will go down. According to some analyses the price for equal intelligence declined 10-1000x per year, depending on the domain.

It probably won't be the same again but I still think we can bet on radically cheaper Mythos level intelligence in the future.

link

SilverElfin 50 days ago

I don’t think Mythos/Fable matter in attracting customers. The typical use is not going to be on the most expensive model, especially with all its frustrating gotchas like refusing harmless prompts and forcing companies to have their data retained.

If OpenAI can offer an alternative to Opus but with better pricing, it will boost their revenue at Anthropic’s cost, in time for the IPO.

link