Hacker News new | ask | show | jobs
by brianwmunz 52 days ago
OpenAPI spec indexing is a good idea...semantic search is good for general API questions, but often sucks at specific questions about exact requirements for fields, etc. We've built a lot of connectors at my company and have had this problem.. the agent makes up arguments or misses required types because it's doing too much inference instead of running against an actual schema. I think benchmarking correctness for each endpoint (did the agent construct a valid request on the first try) would be the most useful thing to eval.
1 comments

Agree, "did the agent construct a valid request on the first try" sounds simple enough to be reliable : )

Might create an API-bench, with a set of APIs grouped by doc size, amount of endpoints, and docs/llms.txt availability.