| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jskherman 817 days ago
	What kind of approach did you take? I was thinking along the lines of requiring something like rewind.ai or some program that autoscreenshots your screen at a set interval (or originally a recorded video split into several images later) and having a vision-capable model (particularly specialized in UIs) describe these set of images in order to build a dataset of images-tags-description and the like.

1 comments

jskherman 817 days ago

There's also libraries like trafilatura in Python featured here in HN some time ago that could extract content from websites to help augment the data.

link