Hacker News new | ask | show | jobs
by jph00 409 days ago
The linked page only compares to very old and very small models. But the pricing is higher even than the latest Gemini Flash 2.5 model, which performs far better than anything they compare to.
3 comments

Their pockets are probably not as deep as Google's in terms of willingness to burn cash for market share.

If speed is your most important metric, I could still see there being a niche for this.

From a pure VC perspective though, I wonder if they'd be better off Open Sourcing their model to get faster innovation + centralization like Llama has done. (Or Mistral with keeping some models private, some public.)

Use it as marketing, get your name out there, and have people use your API when they realize they don't want to deal with scaling AI compute themselves lol

> The linked page only compares to very old and very small models.

They're comparing against the fastest models. That's why smaller models are shown.

Sort of. The benchmarks showing Flash 2.5 doing really well are benchmarking its thinking mode, which is 4x more expensive than Mercury here
Is cost really the main differentiator here, tho? "Solving" coding seems like the holy grail atm (and I agree, it can enable a bunch of things once that's done) and "traditional, organic, human fed code" is pretty expensive atm, so does cost really matter now?

Put another way, how much would company x be willing to spend on "here's a repo, here are the tests, here is the speed now, make this faster while still passing all the tests". If it "solves" something in cudnn that makes it 10% faster, how much would nvidia pay for this? 1m$? 10m$?

Flash 2.5 without thinking mode is also exceptionally good fwiw.