Hacker News new | ask | show | jobs
by huijzer 470 days ago
> This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting.

I'm surprised by the lack of attention that Gemini 2.0 with native audio output got. They have a demo at https://youtu.be/qE673AY-WEI, which I think is really good too. The main problem with Google's model is that this audio output is not supported by the API, but you can try it at https://aistudio.google.com.

In general, text to speech is pretty good nowadays I think. For example, this is a little math video that I made a few days ago: https://www.youtube.com/watch?v=G1mvLrCfjFM with the (old) Google text to speech API. Honestly, I think the narration is better than I personally could have done. It's calm, well pronounced, and sounds relatively enthusiastic.

2 comments

>They have a demo at https://youtu.be/qE673AY-WEI

That's not a demo, that's a video. Anyone can make something like that in an afternoon with a couple friends and a microphone.

Also, Google is known for putting out fake "demos", remember the Google Duplex scam?

Scam? Duplex worked.
I thought it was announced and never heard from again. It may have worked but it never shipped did it?
I made some restaurant reservations, it worked.
I see, I guess it was never a standalone product then, from reading a Reddit post, it’s a feature built into assistant. Thanks, solves a mystery for me.
It was never real. They even admitted they used real people for the service. It was a scam.

Also, that would be quite hard to pull today, 2025, after transformers etc. There's absolutely no chance they were sitting on that back in 2018.

I doesn't work today, let alone 6 years ago.

But good work defending your master.

How do I get to this in aistudio.google.com?
I think the one under "Stream Realtime" should be similar to the demo. It's only Gemini 2.0 flash though and not the full one.