Hacker News new | ask | show | jobs
by moron4hire 1587 days ago
What are you talking about? Microsoft's Text-To-Speech APIs are the best on the market. Google's are definitely a distant second: not as many languages, not as many voices, and the output is nowhere near as good. After those two, there isn't really anything left worth mentioning.
2 comments

I am talking about they have one documentation but if you look deeper there are 2 products, thay say here are 2 options long and short API , but look closer and see one uses names like "Name", "Voice" the other "name", "voice" , the list of voices of this 2 options are not the same , and you randomly get weird errors with shit message that will solve themselves in the next few days. If my memory is correct you authenticate in2 different ways.

So I would prefer MS do this;

1 this is our 2 completly different and incompatible APIs , they might look similar but are not the same, outpiut can differ even if you send same params to each one

2 give me good error messages, like if is your fault a request fails make it clear , if is my input the problem make it clear it is me and what is wrong

I mean, there's the old Windows-only SAPI from the Windows XP days that they haven't been developing for several years now, and the current Azure Cognitive Services, which is just a REST API with a pretty standard auth scheme. There's an official .NET package for wrapping that REST API, but it's certainly not necessary to use it if you know how to handle REST APIs. Is that what you're talking about?
Yes, the Azure API, there are 2 different things under the hood, the short and login APIs, that are different names, different auth headers, different voices supported, voices with same name that have different styles supported. Bad errors messages that popup and get fixed in a few days but only on one of the APIs. The issue is that I am trying to combine the short and long APIs in one product and I am hitting this big inconsistencies, I see lcearly there are 2 teams and do things different , if you use only one section you have a completly different experience.

Edit. I do the REST calls directly, not via an SDK and use the documentation from MS for the REST API so no SDK documentation or SDK code that hides the issues.

Microsoft's Text-To-Speech APIs are the best on the market.

Wow, I had no idea they were that good. Is there a way to get at them from a consumer level? For example, there are plenty of e-reader apps that use Google's TTS to read epub books as audiobooks. Anything similar for Microsoft or is it all on the developer side?

You can just try it out here from the browser if you like:

https://azure.microsoft.com/en-us/services/cognitive-service...

That's a good demo, but it isn't much use for making audio content from ebooks.
Transcribing an entire ebook this way is going to be _expensive_.
They have basically been licensing L&H then Nuance until they bought it outright.