Hacker News new | ask | show | jobs
by pulse-dev 33 days ago
Good point, could be a solid benchmark. Sites are adversarially built to resist automation and success is verifiable later when records actually disappear, so harder to game than WebArena.