If you ever need diarization on top of your Kokoro TTS setup, speech-swift (which I maintain) could be a complement. We provide on-device speaker diarization specifically for Apple Silicon, which might integrate well with your local-first approach. https://soniqo.audio/guides/diarize