try assemblyai, deepgram, picovoice or speechmatics. picovoice is on-device, you gotta fine-tune the model, but it's pretty easy as it gives you pronunciation recommendations, and you can run them serverless. https://picovoice.ai/docs/leopard/#add-custom-vocabulary the others do it through an API call and you gotta find your own pronunciation: https://docs.speechmatics.com/features/custom-dictionary
if you wanna go with whisper you can use picoVoice falcon or pyannote for speaker diarization: https://picovoice.ai/blog/falcon-whisper-integration/ https://github.com/yinruiqing/pyannote-whisper