Hacker News new | ask | show | jobs
by snakers41 1466 days ago
Many thanks for a detailed and thoughtful comment.

> it is very fast and scales quite nicely on CPU with 4 threads (~ twice the speed), but not further (I tried it on a 64 cores box).

Well, practically it does NOT scale even past 6 threads. 64 cores are just overkill, and most likely it will only hurt performance.

> Not sure why since they seem to be using torch's native threading support. > surprisingly, it is not that much faster when run on a GPU

Probably for the same reason, you can speed up the NN only so much. Realistically it can be made 2-3x faster still. Also currently we abandoned batching, so GPUs are not really required at all.

> the quality (as in: what I'm hearing, not a formally measured metric) is good but (YMMV) not as good as turtle.

I believe the compute required during training and inference … may differ by 3 or 4 orders of magnitude (!).

Also note, that some speakers and languages just sound better due to high quality of source material and the amount of work invested and polish.

> it breaks with strange error messages if the text you feed it is too long

Well, there should be a warning somewhere, but it works with text no longer than 512-1024 symbols.

> there is mention of "a model for text repunctuation and recapitalization", which I wonder if it could be used to break a very long text (eg a book) into pieces that can be digested by the tts engine

This model only restores some punctuation marks and capital letters.

There are libraries like razdel for this - https://github.com/natasha/razdel