| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by serjester 683 days ago
	Does anyone have experience applying these models to rendered content (PDF's, webpages, etc). Seems like a really promising area of research to achieve LLM agents.

3 comments

dbish 682 days ago

Doesn’t work well for screen based content in general. One of the authors of SAM2 talked about this explicitly as not being a focus of theirs as it’s not foundational in the research space in the most recent latent space pod

link

abrichr 682 days ago

> Doesn’t work well for screen based content in general.

It's not perfect, but it works: https://github.com/OpenAdaptAI/OpenAdapt/pull/610

> the most recent latent space pod

Link: https://www.latent.space/p/sam2

link

abrichr 682 days ago

We are using Segment Anything Model at OpenAdapt for exactly this purpose: https://github.com/OpenAdaptAI/OpenAdapt/pull/610

It works surprisingly well despite the fact that the model was not trained on this type of data.

link

abrichr 682 days ago

Example on Excel: https://x.com/OpenAdaptAI/status/1798502003045548480

link