VoiceCraft, which has been released open source a week or so ago, can do the same with 5-10sec audio and is pretty convincing. It's pretty fun to play around with: https://github.com/jasonppy/VoiceCraft
Submitted multiple times in past few days. Here's a link to the most upvoted one as of now:
https://news.ycombinator.com/item?id=39865340
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (jasonppy.github.io)