| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by make3 3164 days ago
	this would be an extremely dirty dataset, as there is no incentive to not put your videos in as many categories as possible. if you can only have one, people will still put it in the most popular category a large fraction of the times I'm sure, not the correct one

2 comments

jijji 3164 days ago

yes, it might be a better idea to let viewers of videos choose tags/categories and vote on these categories, than to allow the uploader.

link

adekok 3164 days ago

> there is no incentive to not put your videos in as many categories as possible.

We are talking about how to incentivize people here... why suddenly decide that there's magically "no incentive" for something?

Youtube is free to also de-incentivize people for using the wrong category.

link

make3 3164 days ago

The implicit goal of most YouTubers is to get views, as that's how they make money. Putting your video in a smaller category/in not the max amount of category reduces your exposure, reducing your potential views. The incentive of having a lesser chance of being flagged feels tiny compared to that

link

pwinnski 3164 days ago

Which they would determine using... an automated algorithm which decides which category a video should be in, and comparing it to the category chosen. At which point...

link

make3 3164 days ago

it's not a training set anymore, but a machine learning production task

link

maksimum 3164 days ago

Well, it's both a challenging production task (which Google is great at) and a learning from streaming data task, which Google also has some experience with e.g. news. The latter is certainly a interesting challenge, but many researchers are already working on it.

link

adekok 3164 days ago

> At which point...

They have more data than they do now for machine learning, and a better PR story.

i.e. uploaders can't be mad about the categories, because the categories are chosen by the uploader.

Uploaders can be mad about Youtube double-checking the categories and getting it wrong... which is less likely to happen if they have better data for machine learning.

What, exactly, is the down side of that?

link

pwinnski 3164 days ago

Ultimately the determination is still made by the machine learning process, so you're describing extra work to provide an interface of dubious value that will be used more to misrepresent video content than to provide useful signals, and it seems that customer support related to this would increase dramatically.

I think Google relies overmuch on questionable ML in most areas, but in this case, the alternatives are either ridiculously expensive, or easily exploitable.

link