| Hey - developers behind ElevenLabs here. Thank you so much for the constructive and positive feedback - we’re taking it onboard! We’re currently focused on researching and deploying a different way for speech synthesis that can generate nuanced intonation and emotions by understanding text and taking context into account. Additionally, we provide creators with a way to clone their own voice based on very short samples. With the published blog post, we are now deploying a way to help them design entirely new ones! Anyone will be able to generate that level of quality just with a copy-paste. We are planning to open up Beta later this month. Our goal is to let you convert any written content into high-quality, compelling audio. To address a few questions that frequently came up: - Latency for our streaming TTS is <1s with quality results available above, which is the usual problem with existing good TTS models (like tortoise-tts) - We can clone voices instantly, based just on 5s of speech, without training required - We are working on adding SSML-like support for better control; speed controls will be coming as part of that too - API is directly available as part of Beta; we are preparing the infrastructure to scale easily for the release! We are hiring researchers, frontend and full-stack developers! If you are interested, send over your GitHub account and short message to founders[at]elevenlabs.io. |