| Many thanks for a detailed and thoughtful comment. > it is very fast and scales quite nicely on CPU with 4 threads (~ twice the speed), but not further (I tried it on a 64 cores box). Well, practically it does NOT scale even past 6 threads.
64 cores are just overkill, and most likely it will only hurt performance. > Not sure why since they seem to be using torch's native threading support.
> surprisingly, it is not that much faster when run on a GPU Probably for the same reason, you can speed up the NN only so much.
Realistically it can be made 2-3x faster still.
Also currently we abandoned batching, so GPUs are not really required at all. > the quality (as in: what I'm hearing, not a formally measured metric) is good but (YMMV) not as good as turtle. I believe the compute required during training and inference … may differ by 3 or 4 orders of magnitude (!). Also note, that some speakers and languages just sound better due to high quality of source material and the amount of work invested and polish. > it breaks with strange error messages if the text you feed it is too long Well, there should be a warning somewhere, but it works with text no longer than 512-1024 symbols. > there is mention of "a model for text repunctuation and recapitalization", which I wonder if it could be used to break a very long text (eg a book) into pieces that can be digested by the tts engine This model only restores some punctuation marks and capital letters. There are libraries like razdel for this - https://github.com/natasha/razdel |