| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Ninjinka 501 days ago

Pricing is CRAZY.

Audio input is $0.70 per million tokens on 2.0 Flash, $0.075 for 2.0 Flash-Lite and 1.5 Flash.

For gpt-4o-mini-audio-preview, it's $10 per million tokens of audio input.

2 comments

KTibow 501 days ago

The increase is likely because 1.5 Flash was actually cheaper than all other STT services. I wrote about this a while ago at https://ktibow.github.io/blog/geminiaudio/.

link

radeeyate 501 days ago

I feel that the audio interpreting aspects of the Gemini models aren't just STT. If you give it something like a song, it can give you information about it.

link

sunaookami 501 days ago

Sadly: "Gemini can only infer responses to English-language speech."

https://ai.google.dev/gemini-api/docs/audio?lang=rest#techni...

link

mbrock 501 days ago

I don't know what they mean by this but the obvious interpretation is not true. It understands other languages, it even does really well with low representation languages, in my case Latvian.

link