Hacker News new | ask | show | jobs
by mpapazian 230 days ago
The agents can definitely detect when something is off, given they're using VLMs. They don't necessarily compare it to previous versions, rather they have opinionated takes on whether something looks broken / off. So - yes!