Hacker News new | ask | show | jobs
by dynode 198 days ago
Would you be willing to share more details of what you did?
1 comments

Sure. I had a lot of help from Claude Opus 4.5, but it was roughly:

- Using pyscenedetect to split each video on a per scene level

- Using the decord library https://github.com/dmlc/decord to pull frames from each scene at a particular sample rate (specific rate I don't have handy right now, but it was 1-2 per scene)

- Aggregating frames in batches of around 256 frames to be normalized for CLIP embedding on GPU (had to re-write the normalization process for this because the default library does it on CPU)

- Uploading the frames along with metadata (timestamp, etc) into a vector DB, in my case Qdrant running locally along with a screenclip of the frame itself for debugging.

I'm bottlenecked by GPU compute so I also started experimenting with using Modal for the embedding work too, but then vacation ended :) Might pick it up again in a few weeks. I'd like to be able to have a temporal-aware and potentially enriched search so that I can say "Seek to the scene in Oppenheimer where Rami Malek testifies" and be able to get a timestamped clip from the movie.