Hacker News new | ask | show | jobs
by drothlis 605 days ago
I noticed in your demo it generated the prompt "tap on the 'Log in' button located directly below the 'Facebook Password' field".

Does your model consistently get the positions right? (above, below, etc). Every time I play with ChatGPT, even GPT-4o, it can't do basic spatial reasoning. For example, here's a typical output (emphasis mine):

> If YouTube is to the upper *left* of ESPN, press "Up" once, then *"Right"* to move the focus.

(I test TV apps where the input is a remote control, rather than tapping directly on the UI elements.)