|
|
|
|
|
by echelon
59 days ago
|
|
Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse? Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it? Can they understand layout and visual cues with a VLM or multimodality? Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM? |
|