Hacker News new | ask | show | jobs
by fho 306 days ago
Have you played around with the current vision features? I am pretty sure even gpt-4.1 can give you pretty good descriptions of e.g. screen captures, including being able to "read" and reproduce text.
1 comments

yes, there are multiple addons giving screen readers the ability to prompt ai-s for image recognition. they work rather well, btw, though the value is often situational. agentic behavior might help further, though it will need some polishing.