Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers https://arxiv.org/abs/2301.02111
Website with examples: https://valle-demo.github.io/
For your second question, Apple is already rolling out AI-narrated audiobooks. See: https://arstechnica.com/gadgets/2023/01/apple-rolls-out-ai-n...