Also, even if you hypothetically wanted to use computer vision with an LLM… what API is that LLM going to use to take screenshots and click on stuff?