We definitely need a new dataset with more complex tasks, like uploading files, handling multiple tabs, and handling many more steps.