Hacker News new | ask | show | jobs
by jskherman 771 days ago
What kind of approach did you take? I was thinking along the lines of requiring something like rewind.ai or some program that autoscreenshots your screen at a set interval (or originally a recorded video split into several images later) and having a vision-capable model (particularly specialized in UIs) describe these set of images in order to build a dataset of images-tags-description and the like.
1 comments

There's also libraries like trafilatura in Python featured here in HN some time ago that could extract content from websites to help augment the data.