Quite disappointing their speech to text models are not open source. Whisper was really good and it was great it was open to play around with. I guess this continues OpenAI's approach of not really being open!
In my opinion GPT-SoVITS is the best if you can put in the effort. I'm still using v2 since the output is so good.
Its also the best multilingual one in my testing on Japanese inputs.