Show HN: Voicera – Add life-like AI voice dictation to your blogs and articles | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Voicera – Add life-like AI voice dictation to your blogs and articles (voicera.co)
	72 points by arbobmehmood 1761 days ago

15 comments

kkielhofner 1761 days ago

I submitted this comment on ProductHunt too but I wanted to make sure you see it:

Looks great but FYI there's a long-standing healthcare company that's been in business for over a decade with various speech products/features named "Vocera"[0]. I'm not a lawyer but they have many trademarks on Vocera and the standard is generally "likely to cause confusion". You're probably well in that territory with a speech product that sounds almost identical and is one letter off. When Googling "voicera" Google replaces/suggests "Vocera". There's a pretty decent chance you'll be hearing from them.

[0] https://www.vocera.com/

chakspak 1761 days ago

I was thinking the same. I hear this company name all the time at my work, so when I saw the title, I did a double-take and thought there was a typo. They even have AI voice command, so I briefly thought it was the same company.

garduque 1761 days ago

We use the Vocera hands free devices where I work. At first glance of this title I thought "oh, they do AI dictation stuff, too? I guess that makes sense." And then I noticed the spelling and had the same thought as you. So, ditto.

psyc 1761 days ago

The word 'dictation' is confusing here. I think you want 'recitation', 'vocalization', 'narration' or just 'reading'. Dictation is speech-to-text, this is text-to-speech.

phreeza 1761 days ago

Came to the comment section to say this. I suspect it may be a mistranslation?

r_singh 1761 days ago

that’s what the software does, it dictates the text…

tirpen 1761 days ago

No, it does the exact opposite.

Dictation is writing down what someone is saying.

This is software that says what someone writes down.

sdevonoes 1761 days ago

> Dictation is the transcription of spoken text: one person who is "dictating" speaks and another who is "taking dictation" writes down the words as they are spoken. Among speakers of several languages, dictation is used as a test of language skill, similar to spelling bees in the English-speaking world.

https://en.wikipedia.org/wiki/Dictation_(exercise)

Here the software is the "person who is dictating".

layer8 1761 days ago

It’s not dictation if the spoken text is not recorded or written down by a device (voice recorder) or by a human.

tirpen 1760 days ago

> Here the software is the "person who is dictating".

And who is the "person who is taking dictation" then? Who is writing down the words the software is speaking?

hopesthoughts 1758 days ago

No, not really. It just reads it. You're basically talking about a screen reader, and we don't use the term dictation.

JZL003 1761 days ago

Anyone have a nice opensource/pretrained TTS model they like using? I use google's wavenet TTS heavily to create 'audiobooks' (especially from archive.org which is great for old books). But it's pretty expensive

I periodically look for new versions and, while the examples sound better, they fall down really hard on other text.

synesthesiam 1761 days ago

You might give Larynx a try: https://github.com/rhasspy/larynx

Demo: https://youtu.be/hBmhDf8cl0k

(I'm the author)

follower 1761 days ago

Wow.

I'd really encourage you to invest some time into SEO and promotion of your project.

I spent a bunch of time recently looking for exactly this: TTS, offline, an Open Source licence, and with "decent"/"natural" sounding default voices.

The "best" I ended up finding was `espeak-ng` but, really, the "natural"-ness is barely comparable to what Larynx seems to produce--based on a quick listen to the demos here: https://rhasspy.github.io/larynx/#en-us

On first impressions at least, Larynx definitely seems to be a project that desires a higher profile in this space.

Thanks for sharing the project here, I'll be interested to take a deeper look when I circle back to my side-side project that could benefit from it. :)

(BTW I didn't watch/listen to the YT video all the way through yet but if the narration is generated by Larynx (which it seemed it might be?) it's definitely worth stating that up front.)

Oh, also, really appreciate that there's multiple options for non-male voices too which is something that seems to be sorely lacking in similar projects.

infinite8s 1761 days ago

Yes agreed, this is great! The best I found that could generate faster than real-time without a GPU was speedyspeech (https://github.com/janvainer/speedyspeech). Unfortunately it was only trained using the LJSpeech dataset and I haven't been able to transfer to a multi-voice model. I have been using it to build an story-telling app for my kids.

follower 1761 days ago

> been using it to build an story-telling app for my kids.

Oh, that's cool! :) Has some overlap with part of my interest in TTS technologies.

The existence of 50 voices for Larynx is definitely a significant part of what makes it an exciting development in this sphere of use.

follower 1761 days ago

So, after actually watching the demo video all the way through, it seems the video is narrated by a Larynx-produced voice "southern_english_female": https://youtu.be/hBmhDf8cl0k?t=387

While I appreciate the cinematic aspect of the confirmation/"reveal" at the conclusion of the video, ironically--because the quality is so good--since most people won't get through a 7 minute video, it would be entirely possible to not realise the narration itself is generated.

So I'd encourage you to consider stating it a little more up front.

(Also, based on the release dates it seems like this project is relatively young which explains why it's not very well known at this point.)

JZL003 1761 days ago

That is pretty nice, one of the best collection of voices I've seen and the best interface

Google gives 1 million characters per month free which I don't often go over, but this will be really useful for when I do

I don't want to be unappreciative, it's amazing that this is possible much less free, but when you spend hours listening to it every day, the cracking and warbling do get old. I think there are better models I've heard snippets of but the truly amazing thing about google's is how robust it is to very weird words

(When I tried all the public cloud offering's, IBM's was the marginally nicest AFAICT but it was the most expensive with least free quota)

JZL003 1761 days ago

Yeah https://cloud.ibm.com/catalog/services/text-to-speech it's so smooth

czottmann 1761 days ago

That is ace. Thanks for sharing!

briga 1761 days ago

Mozilla has a pretty good open-source TTS library. In general high-quality pre-trained TTS models are surprisingly hard to find--I'd also be curious to see if anyone knows any good alternatives

TylerLives 1761 days ago

I don't have an answer to your question but I'm curious about something else: What kind of results would you get if you used one of the paid TTS models to generate a dataset and then trained your model on that dataset? Would it be possible to recreate their model in this way?

pvinis 1761 days ago

Cool idea, but wouldn't it be more useful as a feature for an RSS reader? No user of mine would come to the website and listen, but if they use a reader for my feed and other feeds, that would be useful for them.

HumanReadable 1760 days ago

I created such an app for myself two years back. It gets fatiguing very quickly to listen to TTS, even when using state of the art stuff like Wavenet Polly.

arbobmehmood 1761 days ago

Thanks for the feedback. We'll look into it.

arduinomancer 1761 days ago

How does it compare to something like AWS Polly?

I noticed AWS blogs all have this feature in recent times which is cool.

hadrien01 1761 days ago

It seems to be the exact same voice

lifeisstillgood 1761 days ago

I assume there are a number of Youtube channels doing something like this already - I occasionally notice that an otherwise well-researched and presented item has a non-English idiom - such that even a fluent speaker would self-correct.

I guess the idea is to write one and just "release" it in many languages.

The point of all that is, yeah, computer generated voice has gotten to the point I need dumb mistakes to realise ... one of those "the tech has passed an inflection point" moments

AYBABTME 1761 days ago

And yet apps that leverage this are quasi absent. The Firefox screen reader widget is good but the voices are limited and the functionality limited to well formatted pages. E-book software seems to not integrate this tech either.

hopesthoughts 1758 days ago

I wouldn't use something like that myself. I use NVDA full time. If you actually need a screen reader, you probably want something more advance than a browser widget. https://www.nvaccess.org/

AYBABTME 1757 days ago

I don't _need_ screen readers in so far as I can still read with my eyes, but I can read content quite a lot faster with it, and finish reading stuff I otherwise wouldn't, using spoken text instead.

busymom0 1761 days ago

My hacker news client HACK has both reader mode for articles as well as read text to speech for articles and comments. It's brand new so feedback is welcome:

https://play.google.com/store/apps/details?id=com.pranapps.h...

phreeza 1761 days ago

I think I know the type of video you mean, I always assumed they were not TTS but professional speakers hired on Fiver and obliged to speak the text verbatim even though there are weird phrases.

The_Amp_Walrus 1761 days ago

Looks cool! I like the simple implementation.

From a website vistor's perspective: I think an interesting direction would be to offer an asynchronous option (podcast?) for all converted content on a site. I think one reason people want to listen to something rather read is because of the additional freedom that you get when you don't have to pay attention to a screen - eg, you could be walking instead.

That said, I'm not sure what your sell to website owners is. "Engagement" is kind of a vague benefit, why would they want to use this service? What problem is it solving? It's not clear from your landing page why anyone would bother to use this - which (just my opinion) is the #1 problem you need to solve.

I made something similar (https://blogreader.com.au) a while back from the other side - where website readers can choose which blogs they want to listen to. There's also something similar to my site offered by Pocket for free (although I don't like their AI voice very much personally).

Kriish 1755 days ago

Hey Bloggers & Content Creators, We present you Voicera, a platform that will allow bloggers and content writers to embed life-like voice dictation of their blogs directly into their content. All in one click. They can reach their busy readers via interactive blogs that can speak and increase retention and can run in background.

Extremely simple to use and that too completely FREE.A new era of blogging is here.Don't you miss out content creator. Use KJFPIY code to get extra 2000 credits.

hopesthoughts 1758 days ago

Honestly what's the point of these? I mean sure, if you're driving I suppose, but you probably should be concentrating on driving lol. Also, dictation is what I do with Siri. For people who actually need TTS on a daily, regular basis like me, we have our own, so services like these are pointless.

geraneum 1761 days ago

This might be beneficial for visually impaired if the performance (closer to natural voice) is better than the default text-to-speech softwares on the OS. I think this is an important factor.

However OS makers can catch up and threaten the business model of this software by integrating a better TTS.

hopesthoughts 1758 days ago

I wouldn't recommend using natural voices to read anything long. They definitely aren't fast enough. Maybe reading short emails and things like that, but when you get to long articles, it takes too much time. I also think that when a voice pauses to breathe it sounds more creepy than natural, and wastes more time, rather than benefits me in any way.

sramam 1761 days ago

Congratulations on launching!

A question - in the sample dictation on the site, it adds a voice annotation for "features" and "pricing categories". These weren't encoded in the HTML. How does it figure that?

arbobmehmood 1761 days ago

It's manually generated from our in-house content. :)

wrs 1761 days ago

Listen to the bad reading of “Let users listen to your articles while they shop, commute or do something else”. Having your text read by a dumb computer is yet another reason to use the Oxford comma.

spyder 1761 days ago

Wanted to try but I'm getting: "Invalid SSML request" and on other pages "Insufficient credits.", or just returns the sample voice.

arbobmehmood 1761 days ago

Hello. Can you please send me your registered email ID on contact@voicera.co? We'll see what's the issue.

adz_6891 1761 days ago

Looks cool, congrats on the launch! Will you guys be doing text to speech in any indian languages? If so would love to see a demo of that!

arbobmehmood 1761 days ago

Thanks for using our app. Voicera currently supports English (India) accent. However, more languages are definitely in the cards.

llimos 1761 days ago

Isn't this built-in in most browsers?

The_Amp_Walrus 1761 days ago

yes, but it's typically not as good