Y
Hacker News
new
|
ask
|
show
|
jobs
by
mpapazian
230 days ago
The agents can definitely detect when something is off, given they're using VLMs. They don't necessarily compare it to previous versions, rather they have opinionated takes on whether something looks broken / off. So - yes!