The demo is impressive. It uses reference audio at inference time, and it looks like the training code is mostly available [2][3] with a reference dataset [4] as well.
From the README:
> NeuTTS Air is built off Qwen 0.5B
1. https://huggingface.co/neuphonic/neutts-air/tree/main
2. https://github.com/neuphonic/neutts-air/issues/7
3. https://github.com/neuphonic/neutts-air/blob/feat/example-fi...
4. https://huggingface.co/datasets/neuphonic/emilia-yodas-engli...