Hacker News new | ask | show | jobs
by Scaevolus 3909 days ago
I wonder if they included sexy images in their negative training sets -- many videos accrue millions of views (and ad dollars) by having a few frames of cleavage interspersed with other (often derivative) footage.

It would be great if their algorithm picked a thumbnail that reflected the entire video, not just a few frames specifically chosen to game people's compulsive clicking.

3 comments

Most of those are just manually selected thumbnails by the uploader. After uploading, YT gives you 3-5 thumbnails you can choose from.

Also, partnered accounts are allowed to upload custom thumbnails (which can be any image, not necessarily even a screenshot from the video).

Not just partnered accounts. My YouTube account lets me upload custom thumbnails, and I'm certainly not followed. I have maybe a couple dozen videos with maybe a few hundred views among them all.
Is this definitely not algorithmic? I've been noticing for a while that videos might have an incidental flash of cleavage and then that is used as the thumbnail. I'd always wondered if this was arising "naturally" somehow (people pausing that scene perhaps?)

On the basis of the type if video I'd discounted manual intervention. Though if people can just upload any image I'm now surprised that they're not all like this.

> After uploading, YT gives you 3-5 thumbnails you can choose from.

Can you pick an arbitrary video frame, or only one of the suggested thumbnails?

It automatically captures 3 different thumbnails (I guess using the algorithm in OP) and lets you select any 1 of those 3.
I presume they use the image selection as training data too—if not that seems like awfully low hanging data fruit.
Many videos seem to have completely arbitrary thumbnails which are not from the video. Most of the Epic Rap Battles videos, for example.

Perhaps this option 'unlocks' after you reach a certain subscriber count.

As mentioned upthread:

> partnered accounts are allowed to upload custom thumbnails (which can be any image, not necessarily even a screenshot from the video).

They stated that the negative training set was constructed by randomly sampling frames from the video.

If someone wants to game the thumbnails, then they will just manually select the thumbnail to use; and there are to many legitimate use cases for this ability for Youtube to remove it.

> If someone wants to game the thumbnails, then they will just manually select the thumbnail to use; and there are to many legitimate use cases for this ability for Youtube to remove it.

Many channels I watch carefully select an iconic frame from the video to serve as the thumbnail, or construct an artificial thumbnail that provides useful information about the type and subject of the video. Manual will frequently produce better results than automatic for a good-quality channel.

Is there a way YouTube could alter the "view count" to only include views where 100% of the video has been watched? May help cut down on videos with misleading thumbnails and/or titles.
> Is there a way YouTube could alter the "view count" to only include views where 100% of the video has been watched?

You wouldn't want to require 100%, as many people stop when a video starts rolling credits, or when it switches to a screen using annotations to link to other videos. But 50-75% would work well as a threshold to count "views".

A better way to filter those out algorithmically would be to simply look at the thumbs-up vs thumbs-down ratio.

The ones with misleading titles/thumbnails often have far more down-votes than up-votes yet YouTube continues to show those as the highest recommended/relevant (I guess Google prefers click-throughs over user-satisfaction).

Both explicit (thumbs up/down) and implicit (click-aways / closing window) may count toward quality.

There are other confusing cases. I watch a lot of long-form videos, some too long to view in a single session, many of which I download for offline viewing (yt-download). I've been quite actively dissuaded from either publicly rating videos, or even linking to YouTube itself on my primary social channel (G+) given the Anschluss forced-marriage between YouTube, G+, and what had once been individual and separate accounts (similar logic applies to Google Play, and I've taken to "registering" my Android devices under randomly generated usernames).

For videos I particularly like, I may reference them, but only specific portions which I skip to, view, and then close. That's far less than a 100% view, but still significant.

It's not that I'm opposed to providing appropriateness and quality data to YouTube. I absolutely give massive shits about who they share that data with, and how. The "make it all public" default is utterly fucked in the head.

I think Google are starting to realise that.

> A better way to filter those out algorithmically would be to simply look at the thumbs-up vs thumbs-down ratio.

> The ones with misleading titles/thumbnails often have far more down-votes than up-votes

Especially once the total votes pass a certain threshold. Below a certain threshold, any activity makes something interesting; you wouldn't want to let a handful of downvotes bury something early on (as in, 4 upvotes and 6 downvotes). But once you hit the hundreds or thousands of votes, the ratio should take over.