| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 0x1ceb00da 665 days ago
	This suggests that the AI "brain" receives the user input as text prompt (agent relays the speech prompt to GPT-4o) and generates audio as output (GPT-4o streams speech packets back to the agent). But when I asked advanced voice mode it said the exact opposite. That it receives input as audio and generates text as output.

2 comments

mbrock 665 days ago

Both input and output are audio. This post is about bridging WebRTC audio I/O with an API that itself operates on simple TCP socket streams of raw PCM. For reliability and efficiency you want end users to connect with compressed loss-tolerant Zoom-style streams, and that goes through a middleman which relays to the model API.

link

meiraleal 665 days ago

Who did you ask? ChatGPT? Not sure if you understand LLMs but its knowledge is based on the training data, it can't reason about itself, it can only hallucinate in this case, sometimes correctly, most times incorrectly.

link

hshshshsvsv 665 days ago

This is also true for petty much all humans and bypassing this limitation is called enlightenment/self realization.

LLMs don't even have a self so it can never be realized. Just the ego alone exists.

link

TZubiri 665 days ago

No, humans can self inspect just fine

link

mbrock 665 days ago

A lot of psychologists would quibble with that...

link

nialse 665 days ago