Pay attention to the given prompt length in the examples. The first 2 seconds of the first example is a real human speaking. Everything after is generated by the model. It produces what almost sounds like real human speech mimicking the voice of the input but it's currently at a level of something like GPT-2 in terms of meaningful words.