|
|
|
|
|
by soulofmischief
450 days ago
|
|
What you're looking for is speaker embeddings. It's an embedding calculated from an audio snippet. As the other commenter mentioned, it should be combined with a robust voice isolation system. My own system automatically detects new speakers and tries to pick up on cues to identify the speaker, and once they are identified by name, the corresponding average embedding is inserted into a vector database so that the agent can later use the embedding for simple authentication, ignoring chatter in noisy public spaces, RAG context loading, etc. It works pretty well! |
|