Hacker News new | ask | show | jobs
by make3 3117 days ago
this would be an extremely dirty dataset, as there is no incentive to not put your videos in as many categories as possible. if you can only have one, people will still put it in the most popular category a large fraction of the times I'm sure, not the correct one
2 comments

yes, it might be a better idea to let viewers of videos choose tags/categories and vote on these categories, than to allow the uploader.
> there is no incentive to not put your videos in as many categories as possible.

We are talking about how to incentivize people here... why suddenly decide that there's magically "no incentive" for something?

Youtube is free to also de-incentivize people for using the wrong category.

The implicit goal of most YouTubers is to get views, as that's how they make money. Putting your video in a smaller category/in not the max amount of category reduces your exposure, reducing your potential views. The incentive of having a lesser chance of being flagged feels tiny compared to that
Which they would determine using... an automated algorithm which decides which category a video should be in, and comparing it to the category chosen. At which point...
it's not a training set anymore, but a machine learning production task
Well, it's both a challenging production task (which Google is great at) and a learning from streaming data task, which Google also has some experience with e.g. news. The latter is certainly a interesting challenge, but many researchers are already working on it.
> At which point...

They have more data than they do now for machine learning, and a better PR story.

i.e. uploaders can't be mad about the categories, because the categories are chosen by the uploader.

Uploaders can be mad about Youtube double-checking the categories and getting it wrong... which is less likely to happen if they have better data for machine learning.

What, exactly, is the down side of that?

Ultimately the determination is still made by the machine learning process, so you're describing extra work to provide an interface of dubious value that will be used more to misrepresent video content than to provide useful signals, and it seems that customer support related to this would increase dramatically.

I think Google relies overmuch on questionable ML in most areas, but in this case, the alternatives are either ridiculously expensive, or easily exploitable.