|
|
|
|
|
by kwindla
470 days ago
|
|
In general, for realtime voice AI you don't want this model to support multiple speakers because you have a separate voice input stream for each participant in a session. We're not doing "speaker diarization" from a single audio track, here. We're streaming the input from each participant. If there are multiple participants in a session, we still process each stream separately either as it comes in from that user's microphone (locally) or as it arrives over the network (server-side). |
|