Hacker News new | ask | show | jobs
by Saad_M 1888 days ago
Also for healthcare. Nuance is a big player in the healthcare market. A whole segment where DL approaches are not suitable due to lack of transparency.
1 comments

You won't use speech recognition to heal patients or make diagnosis..

And if you use it to control medical devices, then transparency is not what matters. A protocol matters, like the machine repeating the commands and asking for confirmation.

No, you use it for recording patient interactions and making notes.

The main difference in voice for these apps is an intentionally limited and specialised vocabulary, where you have to be pretty certain that they said aberrant rather than apparent, or anuresis rather than enuresis. A lack of humeral would mean you are lacking a particular fluid, while a lack of humerous would mean you are missing a bone. The difference between an apparent mole and an aberrant mole is pretty huge, so you want to be sure your AI isn't over-fitting.

Also the wider ecosystem of this is that you have to have a full software for editing and checking the notes that is compliant to local healthcare regulations, and standard interfaces into medical software. Also penetration into the healthcare market is notoriously hard.

I would imagine synergies in this space include things like Microsoft Teams being in a good place to take on remote consultations compared to Zoom et al, as you could have a seamless flow from consultation to record in the healthcare system and medically-accurate transcription.

I'm going to write this all with the speech to text on my droid and add line breaks after.

Aberrant mole apparent mole. Works like a charm. Now let me blow your mind with another speech to text: ganglioneuralgia.

I think this acquisition has anything to do with DL vs traditional. We're running opaque DL models transparency at a major academic hospital to help treat patients. This is a cake walk by comparison. Specialized vocabulary? Just pay a few people to read some medical textbooks and your done.

I think they just want the reputation customers and product suite.

Also they're making integrations out the wazoo with ehrs. I actually can't believe it, but epoc let Microsoft integrate teams. Integrating and licensing voice is easily a next step.

Microsoft is having success with aggressively pushing Teams because most enterprise IT is too small-minded to explore anything else.

So it's the most "modern" solution they can offer their company without having to do any real additional security work.

> Microsoft is having success with aggressively pushing Teams because most enterprise IT is too small-minded to explore anything else.

Or alternatively, no other solution offers similar price/performance - They are having success because for most corporate subscriptions it is free. Also it's the most commonly used solution for business in the UK, so I have less problems with people joining teams calls than any other provider.

I don't think it's small mindedness - what other solution should they offer?

* Zoom - Awful internal messaging capability. A full office 365 subscription costs almost the same as a standalone zoom licence.

* Slack - No support for external conference calling without integration to another service.

* Google Meet - Makes sense if you use google suite, but not really if you use 365. Other than that not too bad.

* Webex - Awful client requires downloading which causes people to have issues half the time.

The big deal here is that EPIC has been classically closed off to outside software. The fact that they partnered with Microsoft to create an integration is astounding.
Yes and as you say, you don't need interpretability for those use cases. You need performance, as in most situations. Also, people cannot explain why they heard a sentence, they just heard it. So explainability doesn't really mean anything in speech recognition. Performance is key, and dragon was used because it was better.
1. It's not like Hidden Markov Models (the approach that dominated recognition prior to the deep learning revolution) is any more explainable than deep learning models.

2. You generally don't gain more confidence in the accuracy of a particular word by looking at less context. This is neither how human nor machine recognition works.

I'm not saying that a particular technical approach is better, I'm just saying that from a product perspective the medical industry has specific requirements which are currently satisfied in Dragon and not in the generalised speech libraries Apple / Google have at the moment.

I've not got direct experience in healthcare, but do have experience in industrial voice, and this is an area where Apple/Google generalised libraries perform significantly worse than specialised software (Dragon is also big in this industry, albeit at the SDK level). In industrial voice the main requirements are high levels of background noise, restricted vocabulary (20-30 words) and people speaking very quickly.