Hacker News new | ask | show | jobs
by ryukoposting 595 days ago
The voice samples are speaking gibberish a lot of the time, but sonically the voices are fantastic. They sound human, even if it's nonsense syllables.

With SD and LLMs, there's a lot you can do to debug it by studying the way it responds to small changes in the prompt. But, since Hertz-dev is using sound as its input, it would be hard to discern which token you should tweak. Of course, if it's meant to be used in real time, that kind of fiddling isn't an option at all. How would you go about systematically studying Hertz-dev's behavior?