We are in the process of collecting the data right now which is fairly involved, we are going to be opening up that platform for others as well shortly.
It's not my intention to be dismissive because the project idea seems really cool. I'm just curious, why not wait until the source code is ready to post it on HN?
Because gauging interest early and finding other people interested in building is a good idea and frankly very inline with YC thinking. We have already open sourced an enormous amount of code and datasets for computer use https://github.com/agentsea