Y
Hacker News
new
|
ask
|
show
|
jobs
by
maggreenWAI
592 days ago
A) For Mind2Web: because there are multiple ways to reach a goal state - any thoughts how to evaluate if a task was successful? Should we let the LLM/ other LLM evaluate it?