Hacker News new | ask | show | jobs
by schappim 409 days ago
It's nice to see a team doing something different.

The cost[1] is US$1.00 per million output tokens and US$0.25 per million input tokens. By comparison, Gemini 2.5 Flash Preview charges US$0.15 per million tokens for text input and $0.60 (non-thinking) output[2].

Hmmm... at those prices they need to focus on markets where speed is especially important, eg high-frequency trading, transcription/translation services and hardware/IoT alerting!

1. https://files.littlebird.com.au/Screenshot-2025-05-01-at-9.3...

2. https://files.littlebird.com.au/pb-IQYUdv6nQo.png

3 comments

I would be extremely hesitant to assume a direct relationship between pricing and cost. A behemoth like Google is very willing to take significant losses for years to grow market share. Back in 2014-2015 Uber often charged less than the Boston subway, but it always cost them MUCH more under the hood. AFAIK they're still not profitable.

Chinese companies will be similarly eager for market share, but not everyone has the access to the same raw capital.

Absolutely this. Gemini is amazing, but I'm under no illusions that their principal goal right now is to boost their database of high quality training data with free access via ai studio. That said, custom silicon with a model made with internal teams collaborating to make use of that hardware idiosyncracies must be a massive advantage, as well.
Not sure how HFTs are relevant here
HFT is limited by time on how much processing it can do. In theory, a super-fast dLLM would enable to incorporate information sources in their decision-making that were previously too high-level. E.g., imagine using wire reports to predict an arbitrage opportunity that doesn't even exist yet (I dunno, not an HFT guy).

In practice, iiuc, HFT still happens within 10s of milliseconds, and I doubt even current dLLM is THAT fast.

What is the price on Mercury Mini?