May I know how latency will be handled? If a page takes 5 seconds to load, does the vision model keep 'guessing' while the screen is blank, or is it smart enough to wait for a specific visual state?
great question! we have logic to look for things like when certain network requests are completed, dom loaded, etc as well as a timeout so we are not waiting for ever. The LLM based on the screenshot can also decide to wait longer if the page hasn't fully loaded despite the checks we do.