Hacker News new | ask | show | jobs
by gcmac 2110 days ago
I’d say it’s more likely they have super advanced/clever ways of doing the latter. The algorithm could be a simple dot product and the result could be great or terrible depending on how good the feature extraction is.

Pulling useful features out of videos is no small task. The fact that everyone raves about how good the recommendations are indicates to me that this is where their innovation lies.

1 comments

There's so much good meta-data (likes, comments, duration, sound used, views, like/view ratio, skips, loops, subscribes, etc.) that I'd be surprised if they were digging into the contents of the video at all right now.
Bytedance has thousands of the smartest data scientists in China.
Bytedance has thousands of Manual Labor specialists as well.

Using ML it is very easy to tag videos.

They could also be digging only into audio, doing speech recognition on it, then clustering the text. Augment that with the text users have put into the video directly using the in-app editor and you have some pretty solid data.
If that were true, it'd be interesting to see if they push out support for close-captioning. It's an accessibility push, but also would leverage a lot of the same capabilities...
I would also start doing image recognition in the video frames, to extract things like gender, objects, etc.
Would this have any advantage over just using video embeddings (or a sequence of frame embeddings?) which in theory should capture those things in vectorized form.
> I'd be surprised if they were digging into the contents of the video at all right now.

Why would you be surprised to learn TikTok is doing video content analysis?

It can be a) very expensive b) also very difficult to implement.

Video understanding is an active field of research and I'm not sure state of the art is there yet for capturing nuance like engagement potential, categories etc.

State of the art where? College? Silicone Valley? Bangalore? Shanghai? Beijing? Hangzhou?
State of the art in academia, which is largely location agnostic.
Google was able to build a very useful search engine that ran for decades relying on the significance of links and keywords, without much understanding of the meaning of page content. You can get very far with the readily available data, before you need to delve into the fancy stuff to make it a few percent better.
They claim to be looking at the music in the video and avoiding sending you to another video with the same music.
That would be the "sound used". The music in the video is specified/labeled before upload so there's no need to actually process the sound of the video.
Almost all of those applies to YouTube, do they not ?
IIRC youtube vids are too long to do any useful feature extraction from the videos.
The comment I was responding to mentioned a lot of metadata around videos, that is what I was responding to.