Hacker News new | ask | show | jobs
by novok 2115 days ago
I would also start doing image recognition in the video frames, to extract things like gender, objects, etc.
1 comments

Would this have any advantage over just using video embeddings (or a sequence of frame embeddings?) which in theory should capture those things in vectorized form.