| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mikepurvis 303 days ago
	I had the same thought that really an LLM should interact with a browser viewport and just leverage normal accessibility features like tabbing between form fields and links, etc. Basically the LLM sees the viewport as a thumbnail image and goes “That looks like the central text, read that” and then some underlying skill implementation selects and returns the textual context from the viewport.