|
|
|
|
|
by kh9000
274 days ago
|
|
Using the UIA tree as the currency for LLMs to reason over always made more sense to me than computer vision, screenshot based approaches. It’s true that not all software exposes itself correctly via UIA, but almost all the important stuff does. VS code is one notable exception (but you can turn on accessibility support in the settings) |
|
CV and direct mouse/kb interactions are the “base” interface, so if you solve this problem, you unlock just about every automation usecase.
(I agree that if you can get good, unambiguous, actionable context from accessibility/automation trees, that’s going to be superior)