Hacker News new | ask | show | jobs
by threatofrain 2338 days ago
If a company doesn't have the kind of technology to effectively transcribe their large volume of videos and the daily rate of uploads, then must they simply start taking down videos? If a company started hiring inhouse to develop such tech, how long would it take?
3 comments

Given the performance of Google's auto-captioning, I suspect developing worthwhile auto-captioning is pretty difficult; according to [1] the better Youtube channels use gig economy captioning at $1 per minute.

Of course, there would be some scope for efficiency - no need to pay for captioning when the performers aren't speaking!

[1] https://www.wired.com/story/problem-with-youtubes-terrible-c...

While Youtube auto captioning has a superior performance compared to Google Translate output. In youtube translations there is much more language specific nuance. In many European languages a more formal word for the English word "you" exists, in German it is "Sie", French "Vous".

Correct interpretations of these nuances are applied by Youtube, Google however translates "you" by default to the most formal option.

So to me $1 dollar per minute sounds like an awesome deal because there is not much to adjust.

Out of curiosity, can you link to such a gig company?

Youtube auto captioning doesn't work for British Accents especially anything Northern English, Scottish or Welsh.

The same with any voice control, It doesn't work with my west country accent.

I don’t know if it’s a gig company necessarily. As I understand it they don’t use Amazon Turk anymore but I use CastingWords and have been very happy with them. I think they’re just English though.
> Out of curiosity, can you link to such a gig company?

The one mentioned in the article is https://www.rev.com/

Presumably an option would be, if they want to self-host, is to run the audio through an ML service and post the link. Doesn’t really work for user-uploaded video as well. But it would be pretty easy to build into a content upload pipeline at low cost.

As others say it’s not perfect but I have to believe it would be good enough and probably isn’t a bad idea in any case.

That’s what I do for podcasts. Machine translation is getting pretty good but, for publishing an interview transcript, I’d still be spending a lot of my time to clean it up. For a human transcript it’s maybe 15 minutes work.
Automated captioning is not considered good enough to meet the standards.

Pornhub should not be responsible for the lack of captions on videos uploaded by users.

> Pornhub should not be responsible for the lack of captions on videos uploaded by users.

Correct

It should be responsible for those clips it chooses to make available as part of its business

What makes you say automated captioning isn’t good enough to meet the standards? Certainly CC on a lot of TV is pretty messy.
The word accuracy for auto-generated captions is highly variable, sometimes it's good enough, sometimes it's not. In addition, proper captions should identify speaker changes, significant non-verbal sound (e.g. [car honking]), and include punctuation, all things that most auto-generated caption services don't even attempt to do.
Again, what’s the standard for good enough? And I just ran an interview through an AI service. It caught speaker changes. And did punctuation. As well as a person, no? I use human transcription personally. But it seemed plenty adequate as minimal transcription.
What are you looking for, a percentage? I can't give you one.

Think about what the point is; the point is to give hearing-impaired people the same experience as hearing people. If you quizzed some people who only had the audio and others who only read the captions, both groups should be able to answer questions at the same rate of success (caption readers may get more questions right, like naming a speaker).

Some video captions could have numerous errors but of a type that the reader can easily tell what was meant. Other videos might have highly accurate captions but one essential word was missed, changing the whole meaning.

Yes, just like if a company doesn't have the technology to keep its workers safe, it must stop putting them at risk
I don't see the similarity. If we go from blind people suffering from not having access to the audio transcription, to everybody suffering from not having access to either the audio or video (because all the videos were taken down), isn't that a lose-lose for everybody? In worker protection at least the workers win from having more safety.
That's the whole point of disability legislation across the world, including the ADA

If I were to build a new shopping centre, but decided not to put in wheelchair access, would you argue that everyone will suffer from not having access to the shops, and thus it's a lose-lose for everyone?

Are wheelchair ramps an example of cutting-edge / frontier technology in this metaphor? Shouldn't society catch up a little before announcing that a technology becomes a standard? Why shouldn't an advanced AI Siri guide any blind or deaf person through any institution, store, or software experience?

In this situation we are in fact saying that a large volume of user-submitted or professional porn should not be accessible because it might not be worth the money to transcribe. I don't think any stores are closing because a mall had to add wheelchair ramps. It's not like an inspiring technological undertaking.