Here's a sample from a TTS model + vocoder I released for it. I've no wish to deter the motivated, but it'd take a bit of figuring out how to set things up and you'd need to read the docs and code to get oriented :)
This is actually quite impressive too, significantly better than the last time I looked into Mozilla TTS. Roughly how much audio does "two novels" equate to?
As some of the audio is read in different accents to the main accent used, ideally the different accent audio would have been removed. Doing so would be expected to help with voice quality, reducing the overall amount used and, as a bonus, cutting training time too.
There's the demo server which has a simple web UI where you can input text to be spoken, but in regards to setting it up locally it's not that suited for a non developer
I can get about 90% of the quality of 15.ai currently. I think I could surpass 15.ai but not without some help.