What’ll really be interesting is if someone gets deep voice conversion to produce a plausible output, and combines it with this technique. As is, I think it’s still not enough to really fool people, at least with that particular voice actor... But people really tend to trust voices.
Whether the demo is accurate or massaged, I can't say as I've never used the software myself. But if it is anywhere near as good as their demo suggests, it produces output that is more natural than the current deepfakes video outputs (though it also isn't perfect).
As far as I can tell, it's still early days for the audio faking side of things. I'd give it a year or two before it's indistinguishable from real voice. Then we're in trouble.
It has already been done. 20 years ago a research team created a very convincing fake Whoopie Goldberg. Never heard about it since. Anyone's guess who blacked out that technology.