Hacker News new | ask | show | jobs
by urbandw311er 46 days ago
> You speak into the microphone, it gets sent to one of OpenAI’s billion servers, and then a GPU pretends to talk to you via text-to-speech. Neato.

People (including this article) keep talking about OpenAI realtime like it’s a STT - LLM - TTS pipeline but I think this is a fundamental misunderstanding of how the model works. My understanding is that it accepts (and outputs) actual raw audio waveforms. Which, for me, is the sheer joy and wonder of the thing.