|
|
|
|
|
by arthurcolle
450 days ago
|
|
is there some way to do a simple fingerprint or something so that the AI recognizes when it was the one speaking? or do you really just have to WebRTC. I spoke with someone yesterday who told me WebRTC fixed this, so just curious. I wrote a "simple" (ugly) Acoustic Echo Cancellation module that kind of worked, but wondering if anyone had any solutions to make it work over the WebSockets Realtime API |
|
My own system automatically detects new speakers and tries to pick up on cues to identify the speaker, and once they are identified by name, the corresponding average embedding is inserted into a vector database so that the agent can later use the embedding for simple authentication, ignoring chatter in noisy public spaces, RAG context loading, etc. It works pretty well!