Hacker News new | ask | show | jobs
by alophawen 1185 days ago
> Is this true or is there just a massive gulf in application at the moment?

It is only true if you drink the cool aid.

Speech recognition - Siri still have major issues undestanding me. Youtube text2speech routinely mistranscribes because it simply have no understanding of the language.

Machine translation is hilarious at best and dangerously wrong at worst.

Machine vision still could not spot the difference between pastry and furry animals last time I checked.

All of these are examples of non-working over hyped tech. It is not a list of "basically" solved problems.

7 comments

> Machine translation is hilarious at best and dangerously wrong at worst.

I picked a random passage from a novel in French I am currently reading. ChatGPT translated the three paragraphs I ran it on correctly; there are no major quibbles to be had. It is good, coherent English, a correct translation, which closely follows the French original, even capturing some of the poetic imagery effectively.

I'm sure after another paragraph or two there will be a weird screw-up. And there's no consistency in a running translation of any length. Etc. Yes, it's not perfect. Not fully human-equivalent.

Still. I remember when machine translation like I just did was the realm of science fiction. And I thought it would remain science fiction for a long time. The fact that such a thing isn't mind-blowing shows how far things have come, hasn't it?

> Speech recognition - Siri still have major issues undestanding me.

I am using speech-to-text AI transcription every day. It's been revolutionary for me. I am hard of hearing. The cutting edge is Whisper, and it is leaps and bounds over the state-of-the-art just a year ago: https://github.com/openai/whisper

I must have drank the cool aid because when I talk to my AI assistant, it bloody well understands me a lot better than my wife :D.

Machine translation still makes a few mistakes but hardly more than a human.

Machine vision: I worked on a factory where a machine would approve/reject products passing through at a ridiculously high speed, and it never got one wrong. This was 15 years ago.

Your experience is completely the opposite of mine.

You have some odd issue here... You're thinking that any technology is going to be perfect, and it's not, humans are not either. Don't put your base line as perfect, but at the rate of human failure.

Siri isn't even the latest models that have a much lower error rate.

Whisper is better at speech recognition than humans. Learn about the SOTA instead of mentioning bad mainstream products made years ago.
> Youtube text2speech routinely mistranscribes

Isn’t text2speech the opposite of transcribing?

Speech2text would be transcribing, and text2speech would be speech synthesis.

Anyway, assuming you meant speech2text: I found YouTube‘s transcribing quite good. It even understands stuff that is inaudible to me (especially in movie snippets). Of course it’s not perfect, but neither am I.

Thanks, I mean speech2text, the youtube auto-caption feature specifically. Perhaps you have enjoyed it. I regularry use it due to bad hearing, and it is hilariously often mistranslating stuff that a human never would, simply because it does not understand context. It is a dumb system.
TTS is where I started when I was working on this sort of thing, I commonly just say text to speech to mean either direction.
> Speech recognition - Siri still have major issues undestanding me.

Have you tried OpenAI Whisper, especially with its "large" model? Siri and Youtube shouldn't be the yard sticks to judge the entire field, they both have unique hardware constraints and they're far from the state of the art.

My company has replaced hundreds of human transcribers with a speech to text model.

DeepL is actually really good a lot of the time.

When was the last time you checked the status of machine vision, because the problem you suggested is not hard for it anymore.

Your company can only pull the rug like this because the public is not very picky it seems.

A real human transcriber still outperforms any automated system in existence.

> Your company can only pull the rug like this because the public is not very picky it seems.

My company performed extensive accuracy testing and lets people choose to use a human transcriber if they want. Most people are perfectly happy to use the machine transcription. You are just seem bitter. Did an AI steal your wife?