Hacker News new | ask | show | jobs
by accrual 411 days ago
That's pretty neat. Do you essentially loop over a list of images and run the prompt for each, then store the result somewhere (metadata, sqlite)?
1 comments

Yep, exactly, just looped through each image with the same prompt and stored the results in a SQLite database to search through and maybe present more than a simple WebUI in the future.

If you want to see, here it is:

https://gist.github.com/Q726kbXuN/f300149131c008798411aa3246...

Here's an example of the kind of detail it built up for me for one image:

https://imgur.com/a/6jpISbk

It's wrapped up in a bunch of POC code around talking to LLMs, so it's very very messy, but it does work. Probably will even work for someone that's not me.

Nice! How complicated do you think it would be to do summaries of all photos in a folder, ie say for a collection of holiday photos or after an event where images are grouped?
Very simple. You could either do what I did, and ask for details on each image, then ask for some sort of summary of the group of summaries, or just throw all the images in one go:

https://imgur.com/a/1IrCR97

I'm sure there's a context limit if you have enough images, where you need to start map-reducing things, but even that wouldn't be too hard.

Thanks for the reply, I'll see if I can work it out :)
You might want to extract the location from the image exif data and include in the prompt as well. There are reverse geocoding libraries and services that takes coordinates and return a city, which would probably make for a better summary of a trip.