Hacker News new | ask | show | jobs
by mohidbutt 51 days ago
Agree, "did the agent construct a valid request on the first try" sounds simple enough to be reliable : )

Might create an API-bench, with a set of APIs grouped by doc size, amount of endpoints, and docs/llms.txt availability.