|
|
|
|
|
by SamPatt
121 days ago
|
|
Hi gwern! My siblings are very much not developers. That's a lot of data for them to download, store, and figure out a way to view. I was worried they'd just see a list of filenames and not put in the effort. By creating a streaming experience, I thought they'd actually watch them. You might be correct that Gemini could have helped, I didn't test it, but much of the knowledge of who was in a scene, where it was, and why it would matter is inside my head. I doubt any model could effectively label locations and people over 20 years of video. As to the opportunity cost - I'm currently looking for work, so mine is undoubtedly lower than yours! |
|
I wasn't suggesting anything about your siblings, but you, who are a developer. I was just talking about the actual download step, not what you did after that. (Obviously you were going to host them somewhere else in some other form. Probably not DVDs but a little quickie website or maybe just a Flash drive with a HTML file index, say, I don't know, lots of options here to make it user-friendly for your siblings on Christmas Day. The hard drive or Flash drive idea has the benefit of LOCKSS, especially if you use up the spare space providing PAR2 FEC.)
> I doubt any model could effectively label locations and people over 20 years of video.
Actually, Gemini is highly promptable with a large context window and a single still image only takes up ~300 tokens IIRC, so I think that you could probably do so! Just include, say, 3 photos of each person over time with a natural language description, and 1 photo of each location, and that might be enough to get back useful labels. Gemini can even do bounding boxes. (Google is quite proud of its vision and video analysis capabilities.) And you can run multiple passes or split up videos etc.